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Chapter 1 

Introduction 


1.1 Purpose 

The primary intent of this manual is to help you troubleshoot hardware problems 
in a CM-5 system. It contains a variety of information for this purpose, including: 
descriptions of the system hardware, a description of the error repotting system, 
descriptions of the diagnostic tools provided for troubleshooting hardware faults, 
and recommended diagnostic procedures. 

How you use this manual will largely depend on how experienced you are with 
CM-5 systems. 

■ If you are new to the CM-5 and its diagnostic software environment, 
cmdiag, you should read through the entire manual at least once before 
you have occasion to use it. Then, when a troubleshooting situation arises, 
follow the basic troubleshooting procedure described in Section 2. This 
procedure will get you through the initial diagnostic steps and will guide 
you in using other sections of the manual as the particular troubleshooting 
session requires. 

■ If you have a good understanding of the system architecture and experi¬ 
ence using csndiag, you can treat this document as a reference manual, 
consulting it only for specific details. 

NOTE: This manual assumes that you have received formal training on CM-5 sys¬ 
tem administration and maintenance issues. It does not provide comprehensive 
documentation of these topics. 
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1.2 General Troubleshooting Practices 

The practices listed below have been found to promote more efficient trouble¬ 
shooting in virtually all situations. They either simplify the troubleshooting task 
or they help avoid introducing new problems as old ones arc investigated, 

1. Gather initial information. — Before taking any active diagnostic 
steps, gather as much information about the failure as possible. Some 
questions that often uncover useful clues arc listed below, 

• If the failure occurred while running a user program, ask if the pro¬ 
gram has run successfully on this CM-5 before. If so, has the CM 
changed in any way since then? If the answer is yes to these ques¬ 
tions, find out what changes it has undergone since the program last 
ran successfully. 

■ Has the user program run successfully on a different partition of this 
CM? If the answer is yes, focus attention on the hardware asso¬ 
ciated with the failing partition. 

■ Likewise, has the program run successfully on another CM-5 sys¬ 
tem? If so, is that system different in any way — software version, 
hardware configuration , ECO levels, etc ? If yes, consider the impli¬ 
cations of those differences. 

■ Have any other programs run successfully on Lite same CM parti¬ 
tion? If yes, examine the differences between the successful and un¬ 
successful programs. For example, arc the memory requirements of 
one program significantly different titan the other? 

2. Check for simple solutions first. — Check the obvious conditions, such 
as power supply or cooling fan failure. While these checks seldom lead 
to an immediate fix, they will avoid unnecessary troubleshooting time 
and effort on those few occasions when the solution is simple. 

3. Change as little as possible, — Every modification to hardware has the 
inherent risk that it will introduce a new problem. When changes arc 
unavoidable, try to adhere to the following guidelines: 

■ Change one thing at a time and record all changes. 

■ When you make a change that does not fix the problem, undo the 
change before progressing to the next step, particularly if that step 
involves making another change. 
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■ If diagnostic messages point to a specific circuit board, check the 
board's seating before replacing it. Poor electrical contact caused 
by contaminated edge connectors or by inadequate seating is a com¬ 
mon cause of faulty performance. Reseating a circuit board will 
help clean the metal surfaces and re-establish solid contact. After 
reseating a board, retest to see if the fault was corrected. 

4. Swap boards before changing cables. — Use board swapping for fault 
isolation before changing cable connections. In a system that has been 
running successfully, cable faults are much less likely than component 
or board failures. In addition, disconnecting and reconnecting cables 
poses more risk of causing a new problem than replacing circuit boards. 

5. For intermittent problems, increase the length of test runs to stress 
the hardware being tested. 

6. ALWAYS WEAR ANTI-STATIC PROTECTION WHEN HANDL¬ 
ING CIRCUIT BOARDS. STORE BOARDS IN ANTI-STATIC 
BAGS. 
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1.3 Summary of Diagnostic Tools 

Various diagnostic tools are provided for troubleshooting hardware failures on 
the CM, The following list summarizes these tools and indicates where you can 
find explanations of their use. 

■ Use kpndbx to investigate Processing Node failures. The procedure for 
using Jepndbx is described in Appendix A. Its man page is provided in 
Appendix M. 

■ The primary diagnostic tool for investigating hardware faulls in the CM-5 
is cmdiag. Its use is described in Appendix B. The cmdiag man page is 
in Appendix M. 

■ A subset of cmdiag tests target 10BA hardware. These tests are aug¬ 
mented by several independent lest packages, which exercise the different 
I/O devices and their interconnecting hardware. The various I/O-related 
diagnostic tools arc described in Appendix F. 

■ A number of system verifiers are available for exercising the CM across 
functional boundaries. These provide comprehensive coverage of system 
functions by closely cmulaiing the behavior of user applications. These 
verifiers are described in Appendix H. 
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Chapter 2 

Troubleshooting Fundamentals 


Often, the initial stage of a troubleshooting session — deciding what action to 
take first — can be the most difficult. This chapter offers a brief set of guidelines 
for dealing with this early phase. 

The steps presented below offer a rational opening strategy for troubleshooting 
CM-5 hardware faults, regardless of the source of the failure. Figure 1 illustrates 
the key points in this procedure. 

NOTE: As a matter of convenience, you can have cmdiag running on the full 
system at all times. This cmdiag would not be associated with any individual 
partition (i.e., it would not be invoked with the -p option) and would therefore 
not interfere with timesharing daemons running on partitions. Appendix M con¬ 
tains the cmdiag man page. 

1. If cmdiag is not already running, invoke it on the entire CM (do not use 
the “p option). 

2. Run f ind-cm-error. See Appendix K for a description of f ind-cm- 
error output. 

3. The next step depends on what find-cm-error reports. 

» If no errors arc repotted, the problem may be in a Processing Node 
or in an area of I/O hardware that is not accessible to the diagnostic 
network. In cither case, PN registers may provide useful status in¬ 
formation. Run kpndbx to read PN status. Appendix A describes 
the kpndbx procedure. 

• If Control Network errors are reported, use the troubleshooting pro¬ 
cedure described in Appendix D, Tracing Control Network Errors. 
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• If Data Network errors are reported, use the troubleshooting proce¬ 
dure described in Appendix E, Tracing Data Network Errors. 

• If f ind-cm-error points to I/O hardware, use the troubleshooting 
tools described in Appendix F. 

■ If all chips on a backplane or on multiple backplanes report errors, 
the source of the problem can be a faulty power supply, system 
clock, or diagnostic network. Appendix G describes the procedure 
for troubleshooting symptoms of this kind. 


October 9,1992 
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Primary Troubleshooting 
Decision Tree 


Fiom 

pagel 



Figure I. Strategy for initial phase of troubleshooting session. 

(2 of 2) 
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Chapter 3 

Preventive Maintenance 


3.1 Summary 

The CM-5 preventive maintenance program is intended to expose incipient hard¬ 
ware faults in a controlled setting, reducing the likelihood of failures occurring 
while user code is executing. There are two schedules in the CM-5 preventive 
maintenance program — a short, less comprehensive daily routine and a longer, 
more rigorous weekly procedure. 

■ The daily routine implements the Processing Node test group, the Data 
Network verifier group, and the Control Network verifier group. This pro¬ 
cedure is illustrated in Figure 2; detailed descriptions of each step are pro¬ 
vided in Section 3.2 and are cross-referenced in the margin of Figure 2. 

■ The weekly program involves running the JTAfi tests in addition to the 
daily test groups. Figure 3 illustrates this procedure in transcript form with 
cross references to detailed descriptions in Section 3,2.2. 

If the system includes I/O hardware, several I/O tests provided by cmdiag 
are added to the weekly regimen. This expanded procedure is illustrated 
in Figure 4 with cross references to detailed descriptions in Section 3.3.3. 

NOTE: Currently, the weekly maintenance procedure is not compatible 
with user partitions and, so, requires exclusive use or the system. 

cmdiag includes an interface to the empartition software. This interface al¬ 
lows you to use the high-level empartition functions to restrict the scope of 
cmdiag to a subset of the system hardware. Tests run within that partition will 
not interfere with user applications running in other partitions. 
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Daily PM Procedure 


The system used to illustrate this procedure example has the following attributes, 
-System is named Calliope and has 256 TNs. 

- Diagnostic server is named homer,-think .com* 

- Calliope has two 128-PN partitions, which are allocated to partition 
managers named virgll, think * com and mil ton « think, com, 

- Diagnostics will be run on virgil , think. com. 




login; userJd 

% «u 

password: rootj?assword 

SU homer,think.com /dev/console 

hom# cd /uBr/diag/cmdiag 




2 hom# BitoflV CMDIAG_PATH Ai sr/diag/emdiag 
hom# flttttnv JTAG SERVER homer, think, com 


hom# /uar/etc/cmpartition list -1 
CM System "Calliope" 

2$G Processors [ 8 Mbytes memory 
2 Partition Managers 
virgil.think.com 
milton,think , com 
Available PM Ranges: 

All PNs in use 
I OP Addresses 


Mame 
V128 
Ml 2ft 


Partition Manager Size State 
virgil.think,com 126 ACTIVE 
mi 1ton.think.com 12fi ACTIVE 

•. X - : - “ •- : • • • ?. - • 

viv * •!.! * V. ^lAx.vA 1 i v,p * & v>: jgv Vjj ^ :<■ SwASE 1 *™ 2 k5!jm 8|f?j jC-I"-! 


Modes 

0-127 

128-255 

480-480 


Description 
virgil 
mi 1ton 


hom# rlogin -1 root virgil,think,com 

password:: QuiVive 

virg# Bfttanv CMDXAGJ?ATU Aisr/diag/emdiag 
virg# Bfttenv JTAG_£ERVER homer.think,com 
virg# cmp&rtition stop -pm virgil,think,com 


(continued on next page) 


Figure 2. Daily preventive maintenance — I of 2 
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Daily PM Procedure 


(continued from previous page) 


virg# cmparti 
CM System u 
256 Processors [ 8 Mbytes memory 

2 Partition Managers 
, ; :,t: 

virgil,tliink.com 

milton.think.com 

All PNs in use 
IOP Addresses 
4 80 


SSSS&S® 




Hodes 

0-127 

123-255 

480—480 


Description 
virgil 
miIton 


Name Partition Manager Siz i 
VI2 8 virgil* think.com 12 3 
M128 milton.think.com 128 




virg# cmdiag -C -p virgil *think. com 

<CiVf-DlAG> rgroups m PN global broadcast combine dr 


mm 


diagnostic test report 




NOTE 


When the partition managed by milton becomes available for testing, repeal 
steps 4 through 6 on milton. 


Figure 2. Daily preventive maintenance — 2 of 2. 
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Weekly PM without I/O 


The system used to illustrate this procedure example has the following attributes, 
-System is named Calliope and has 256 PNs. 

- Diagnostic server is named homer. think. com. 

-Calliope has two 128-PN partitions, which arc allocated to partition 
managers named virgil. think. com and milton. think. com. 
-Diagnostics will be run on homex.think.com. 


wem 




login; userjil 

% su 

password: rootjmssword 

SU homer.think.com /dev/console 

hom# ed /uar/diag/cmdiag 




hom# eetenv cmdiag_fath /usr/diag/cmdiag 

hom# afttenv JTAG SERVER homer.think.com 


hom# /usr/*tc/cmpartition list -1 

CM System * Cal Hope" 

256 Processors [ 8 Mbytes iremory 
2 Partition Managers 
virgil.think.com 
milton.think.com 
Available PN Ranges: 

All PNs in use 


Ha me 
VI28 
M120 


Partition Manager Size State Nodes Description 

virgil. think.com 128 ACTIVE 0-127 virgil 

milton, think» com 12 £ ALLOCATED 128-255 milton 

480-480 


hom# rlogin -1 root virgil.think.com 
password: QuiVive 

v'irg# B«t«nv CmdIAg_path /uar/diag/cmdiag 
virg# aetenv JTAG_SERVER homer.think.com 
virg# empartition stop 
virg# empartition delete 
virg# exit 


• , 


(continued on next page) 


Figure 3. Weekly maintenance with no I/O — 1 of 2 
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Weekly PM without I/O 


(continued from previous page) 


4 horn# empartition delete —pm mi It on. think* com 

com,) 




horn# enpartition list -1 

CM System "Calliope" 

256 Processors [ 8 Mbytes memory 
2 Partition Managers 
virgil,think*com 
milton,think.com 
Available PN Ranges: 

All PNs in use 
IOP Addresses 

a a 






hom# ,/cmdiag -C 

<CM“DI AG> rgroups m SVME 
<CM-DIAG> rgroups m CLKDtf 
<cm-djag> rgroups m clkbuf 
<cm-diag> rgroups m spi 
<CM-DIAG> rgroups m FILLER 

<CM“DiAG> rgroups m PE FEMem 

■ 

<CM-DIAG> rgroups m CH 

■ 

<CM-DJAG> rgroups m DR 






Figure 3* Weekly maintenance with no I/O — 2 of 2, 
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Preliminary 



Weekly PM with I/O 


Introductory Notes 


The system used to illustrate this procedure example has the following attributes, 

- System is named Calliope and has 256 PNs. 

- Diagnostic server is named homer. think.com, 

-Calliope has two 128-PN partitions, which are allocated to partition 
managers named virgil .think . com and milt on. think . com. 

- Diagnostics will be run on homes:, think, com partition. 


,.L : " : 


*** INITIALIZE SYSTEM *** 


login: userjd 

q. 

. 

p a 9 swo r d: rootpassword 

SO homer.think,com /dev/console 

hom# cd /usr/diag/cmdiag 


mm 


hom# sGtnnv cmdiag_path Aier/diag/cmdiag 
hom# satenv JTAG SERVER homar.think, com 


■V ' :• 


hom# /u»r/ato/ojnpa£tition list -1 
CM System "Calliope" 

256 Processors [ S Mbytes - memory, SPARC IU, SPARC FPU ] 

2 Partition Managers 
virgil.think.com 
miIton.think.cbm 
Available PN Ranges: 

All PNs in use 

V ■ • 

IOP Addresses 

4 on 
#80 

. .. v . M IHPI PHPH 

Name Partition Manager Size State Nodes Description 

VI2 8 virgil.think.com 128 ACTIVE 0-127 virgil 

M128 milton,think-com 128 ALLOCATED 12S-2S5 mi It on 

4 80-4 80 




hom# rlogin -1 root virgil.think.com 
password: QuiViva 

virg# setenv CMDIAG PATH /usr/diag/cmdiag 
virg# setsnv jtag server homer.think.com 
virg# empartition stop 
virg# empartition delete 
virg# exit 


(continued on next page) 


Figure 4, Weekly preventive maintenance with I/O — 1 of 3 
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— Weekly PM with I/O - 

(continued from previous page) 


4 hom# rlogin -1 milbon * think * com 
(cont.) password: QuiVaLa 

milt# aatfinv cmdiag_path /u*r/diag/cmdiag 
milt# s«t*nv JTAG_serVer homor - think . com 
milt# empartition daleta 
milt# «*lt 


. . .sj ___ 

horn# c^artition list -1 

CM System ^Calliope" 

256 Processors [ 3 Mbytes memory, SPARC IU r SPARC FPU ] 
2 Partition Managers 
virgil,think,com 
mi 1ton,think,com 
Available Ptt Ranges 
All PNs in use 
IOP Addresses 
4 SO 


mwm 




hom# ,/cmdiag -C 
<CM“D1AG> rgroujpa m 


diagnostic test report 


TESTDATAVAULTS 


dvloqin :uxer id 
~ 

password: root^password 


dv# /usr/local/ate//diag/dvcoldboot +c n 

dv# /usr/local/etc/diag/diagservor/diagafirvex £ 


(continued on next page) 


Figure 4, Weekly preventive maintenance with I/O — 2 of 3 
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Figure 4. Weekly preventive maintenance with I/O — 3 of 3, 
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3.2 Daily Preventive Maintenance 

3.2.1 Initial Conditions 

The diagnostic procedure described in Section 3.2.2 assumes that the partition 
from which you will run cmdiag already exists. If this is not die case and you 
need to create the partition, refer to the empartition man page in Appendix 
M for instructions. 

NOTE: Currently, the only cmdiag test groups that may be run within a partition 
without affecting other partitions, arc the PE and verifier test groups. Conse¬ 
quently, only these functions will be used during the daily preventive mainte¬ 
nance activities. 


3.2.2 Diagnostic Procedure 

Perform the procedure described below each day. 

NOTE: If your CM system includes I/O facilities, these will be tested during the 
weekly maintenance sessions when you have full use of the system. 

1. Login at the CM-5 System Administration Console as root, and change di¬ 
rectory to /us r/diag/cmdiag. 

login : USer_id 

password: root!password 

# cd Aisr/diag/ciodiag 

2. Set the cmdiag_path and jtag_server environment variables. The de¬ 
fault cmdiagjpath is /usr/diag/emdiag. The JTAG server vari¬ 
able must specify the hostname of the diagnostic server. 

# sefceorw CMDIAGJPATH /us r/diag /cmdiag 

# satanv JtAg_SERVER diag_server Jwstname 

3. Run empartition list -1 to be certain you have an accurate under¬ 
standing of the current partitioning status of the CM — what partition con¬ 
figurations arc in effect, their names, the hostnames of their partition 
managers, and their state of use. 

# /uar/etc/empartition list —1 
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4. If depart it ion list -1 reports the state of the target partition as ac¬ 
tive, it means ts-daemon is running on that partition. If so, rlogin to 
the appropriate partition manager and run empartition stop to hail the 
timesharing daemon. Then exit. 

NOTE: The following example shows cmdiag_path and jtag_server 
being set. If these environment variables are already set on this partition 
manager, this step can be skipped, 

# rlogin -1 root pm name 
password: root^password 

# setenv CMDIAG_P ATH /usr/diag/ciridiag 

# setenv JTAG_SERVELR diagjserver _hostnamc 

# /use/ etc/empartition stop 

# exit 

pmname is the name of the targeted partition manager. 

5. Run empartition list -1 again. The target partition should now 
show an allocated status. This means the partition is defined and still 
associated with its partition manager, but the timesharing daemon is not 
running. 

6. Run the daily preventive maintenance test groups. 

# emdiag -C -p pmjiamc 

<CM“DIAG> rgroups m PN dr combine global broadcast 
partition 

<CM“DIAG> 

The -p pm name option specifies the partition in which cmdiag will be 
run; pm name is the hostname of the Partition Manager. 

7. If any test fails, record the messages generated by the tests and notify 
Thinking Machines product support — (617) 234-4000. 

If no test fails, the daily preventive maintenance procedure is now com¬ 
plete. Return the CM-5 to regular use. 
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3.3 Weekly Preventive Maintenance 

3.3.1 initial Conditions 

The weekly preventive maintenance procedure requires that you have exclusive 
use of the system for the duration of the test session. 

The test sequence differs greatly depending on whether or not there is I/O hard¬ 
ware to be tested. Section 3.3.2 describes the procedure for systems with no I/O. 
Section 3.3.3 covers systems with I/O. 


3.3.2 Weekly Test Procedure with No I/O 

The following procedure is summarized in Figure 3 for quick reference. 

1. Login at the CM-5 System Administration Console as root, and change di¬ 
rectory to /usr/diag/emdiag. 

login : userjd 

password: root_password 

# cd /uar/d±ag/cmdiag 

2. Set the cmdiag_path and jtag_server envi ronmera variables. The de¬ 
fault CMDIAG_J?ATH is /usr/diag/emdiag. The JTAG_SERVER vari¬ 
able must specify the hostname of the diagnostic server. 

# sstonv CMDIAG_J?ATH /us^/diag/caidi^g 

# Bat&nv jtag_server diag_server Jiostname 

3. Stop and delete all partitions. To do this, you need to know the hostname 
of each partition manager to which a partition is allocated. If necessary, 
run cittpattition list -l to get this information. 

# /uBr/atc/emp@rt:±kion list —1 

4. Then run empartition stop and empartition delate on every 
partition manager that has an active partition. Run empartition de¬ 
lete on every partition manager that has an allocated partition. 

For example, if empartition list shows virgil. think. comas 
active and milton.think.com as allocated, perform the steps 
shown below. 
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NOTE: This example is structured to demonstrate certain characteristics of 
the empartition stop and delete commands. 

# Because empartition stop must be performed on the partition 
manager controlling the partition to be stopped, this example in¬ 
cludes an riogin to virgil, which has an active partition. 

■ empartition delete, however, can be done remotely. Conse- 
quendy, the inactive partition on milton is deleted from the diag¬ 
nostic console. See step 4 (cont.) in Figure 3. 

# riogin -1 root virgil.think.com 
pa a a word: TOOtJKlSSword 

virg# Alar/etc/empartition atop 
virg# /uar/otc/en^artition delete 
virg# exit 

# /uar/etc/cmpartition deleta -pm milton,think* com 

# 

NOTE: This example assumes that cmdiag_path and jtag_server are 
already set appropriately on both partition managers* If these environmen; 
variables are not correct, log in to each partition manager and set them as 
follows. 

I riogin —1 root virgil * thinkcom 

password: root^password 

virg# aetenv CMDIAG_PA.TR /uar/diag/emdiag 
virg# aetenv JTAG_SERVER diag_$erverJiOSinamc 


virg# exit 

# riogin —l root miltonrthink.com 
password: r00t_passw0rd 

milt# aetenv CMDIAG_PATR /uar/diag/cmdiag 
milt# aetenv JTAG_SERVER diag_serverJiostname 


milt# exit 

5* Run empartition list -1 again. It should report no partitions either 

ACTIVE or ALLOCATED. 

6, Run the manufacturing version of the JTAG test group, 

# ./cmdiag —C 
<CM-DIAG> rgroups m SVME 
<CM-DIAG> rgroups m CLKDN 
<CM-DIAG> rgroupa m CLKBUF 
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<CM-DIAG> ryroups m SP£ 

<CM-DIAG> rgroup* ta FILLER 
<CM-DIAG> rgroups m PE PEMEM 
<CM-DIAG> rgroupa m CN 
<cm-diag> rgroups m DR 

7. If any test fails, record the error messages generated by the tests and noli fy 
Thinking Machines product support — (617) 234-4000. 

If no test fails, go to step 8. 

8. Create a partition that encompasses all PNs in the system. Enter the lowest 
and highest PN network addresses for first_pn-last_pn, respectively, 

<CM-DIAG> q 

# /u*r/«tc/cmpartition create -pnjcancre first_pn-last_jm 

9. Execute a system reset and reset the Partition Manager’s interface module. 
Then run the processor chip tests, followed by the Data Network and Con¬ 
trol Network verifiers. 

t cmre*ot 

# -s 

# ,/cmdiag -C -p pm_name 

<CM-DIAG> rgroup® m PN dr combine global broadcast 
partition 

pm name is the hostname of the Partition Manager. 

10. If any test fails, record the error messages generated by the tests and notify 
Thinking Machines product support — (617) 234-4000. 

If no test fails, go to step 11. 

11. If the system has multiple Partition Managers, repeat steps 8 and 9, using 
a different Partition Manager each time. 

enure set -s must be repeated for each Partition Manager that is used to 
run Cmdiag, 

12. When the CM-5 passes all tests invoked in steps 6 through 9, the preven¬ 
tive maintenance session is complete. Return die system to regular use. 
This requires stopping and deleting the system-wide partition created in 
step 8 and recreating and starting the partitions deleted in step 4. 
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3.3*3 Weekly Test Procedure with I/O 

The weekly preventive maintenance procedure is described below. Because it 
involves many steps, its description is organized into several phases to minimize 
confusion. The procedure is also summarized in Figure 4 for quick reference. 

INITIALIZE SYSTEM 

The following steps take the system from its normal operating configuration, 
preparing it for the first diagnostics sequence. 

1. Login at the CM-5 System Administration Console as root, and change 
directory to /usr/diag/emdiag. 

login -.user id 
password: root^password 

# cd /usr/diag/cmdiag 

2. Set the cmd i ag_path and jtag_server environment variables. The de¬ 
fault CMDIAG_P ATH is /usr/diag/emdiag. The JTAG_SERVER vari¬ 
able must specify the hostname of the diagnostic server. 

# setanv CMDIAG_PATH /usr/diag/emdiag 

# e«tonv JTAGjSERVER diag_serverJiostname 

3. Stop and delete all partitions. To do this, you need to know the hostname 
of each partition manager lo which a partition is allocated. If necessary, 
run empartition list -1 to get this information. 

# /usf/ate/empartition list —1 

4. Then run empartition. atop and empartition delete on every 
partition manager that has an active partition. Run empartition de¬ 
lete on every partition manager that has an allocated partition. 

For example, if empartition list shows virgil . think . comas 
active and milt on .think. com as allocated, do the following. 

# rlogin -1 root virgil.think.com 
password; root_password 

virg# /usr/stc/oiripartition stop 
virg# /usr/etc/cmpartition delate 
virg# exit 

# rlogin -1 root irdlton.think.com 
password: root_password 

milt# /usr/atc/crnpartition delete 
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milt# exit 
# 

5. Run cmpartition list -1 again. It should report no partitions either 
ACTIVE or ALLOCATED. 

RUN COMPLETE JTAG TESTS 

6. Run the manufacturing version of cmdiag rgroups. Tliis will perform 
the complete JTAG lest suite, including all IOBA hardware identified in the 
io. conf configuration file. 

# . /cmdiag —C 

<CM-PIAG> rgroupa m 

7. If any test fails, record the error messages generated by the tests and notify 
Thinking Machines product support — (617) 234-4000. 

If no test fails, go to step 8. 
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TEST DATAVAULTS 

8. If the system includes Data Vaults, perform steps 9 through 14. If there are 
no DataVaults to test, skip to step IS. 

9. Log on to the station manager of the first Data Vault you plan to test and 
set the command-channel mode by running dvcoldboot +c/i. n speci¬ 
fies which DataVault port will be used—use cither 0 or 1. 

While you are at the DataVault console, scan its diagnostic server running 
in background. The DataVault diagnostic server will be needed in step 17. 

login : user_id 
password: root_password 

dv# /uar/local/etc/dlag/dvcoldboot +o n 

dv# /\iBr/local/fitc/diag/diagattrvar/d±agflaw»r £ 

10. Run the iopdv test from within cmdiag. 

<CM-DI AG> exocut &-a 11 -iOpdv-1eats 

NOTE: If the IOP and DataVault station IDs and lire DataVault starling 
block are not already defined, you will be prompted to supply them. Speci¬ 
fy a DataVault starting block address no higher than 960; this will ensure 
that test data will not exceed the 1024-block zone reserved for diagnostic 
use on the DataVaulL 

11. If any test fails, record the error messages generated by the tests and notify 
Thinking Machines product support — (617) 234-4000, 

If no test fails, go to step 12. 

12. Run the ioppe tests from within cmdiag. 

<CM- OI AG> exe cut e 11“iopp«-t a at s 

13. If any test fails, record the error messages generated by the tests and notify 
Thinking Machines product support— (617) 234-4000. 

If no test fails, go to step 14. 

14. Repeat steps 9 through 11 for each DataVault in the system. Then go on 
to step 15. 
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TEST CM-HIPPI and CM-IOPG 

15. If CM-HIPPI and/or VMEIO devices are also attached to the CM-5, log on 
to their station managers as root and start their diagnostic servers running 
in background. Otherwise, just proceed to step 16. 

16. Verify that the file dnio_ecnf ig. machinejname. is present on the Sys¬ 
tem Administration Console. It will be used by the end-to-end tests, which 
will be executed next. 

17. Now, run the cmdiag end-to-end tests. The following command will auto¬ 
matically invoke the appropriate tests for all Data Vaults, CM-HIPPIs, and 
VMEIO devices connected to the CM-5. 

<CM-DIAG> test-cmio-dovico-data-xfor 

18. If any test fails, record the error messages generated by the test and notify 
Thinking Machines product support— (617) 234-4000. 

If no test fails, go to step 19. 

CREATE SYSTEM-WIDE PARTITION and RUN 
PROCESSOR TESTS and NETWORK VERIFIERS 

19. Create a partition that encompasses all PNs in the system. Enter the lowest 
and highest PN network addresses tor first_pn-lastjm. 

<CM-DIAG> q 

# /u*r/ofcc/cmpartition create —pn_range first_J)n—t(lSt_pn 

20. Execute a system reset and reset the Partition Manager^ interface module. 
Then run the processor chip tests, followed by the Data Network and Con¬ 
trol Network verifiers. 

# cmrosot 

# cmresot —s 

# cmdiag -c -p pmname 

<CM-DIAG> rgxoups m PE dr combino global broadcast 
partition 

<CM-DIAG> 

pmname is the hostname of the Partition Manager and specifics the parti¬ 
tion in which cmdiag will be run. 

21. If any test fails, record the error messages generated by the tests and notify 
Thinking Machines product support—- (617) 234-4000. 

If no test fails, go to step 22. 
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RUN I/O VERIFIERS 

* 

22. When the CM-5 passes all tests invoked up through step 19, it is time to 
run the system verifiers that include full-speed I/O. This procedure begins 
at step 23. 

23. Ensure that fsserver is running on all DataVauIis, CM-HIPPIs, and 
VMEIO devices connected to the CM-5. 

24. Start the timesharing daemon on the partition created in step 19. 

<CM-DIAG> q 

# /uar/fitc/cmpartiticn start -cmd ta-daemon 

25. Next, choose one Data Vault or VMEIO device and set the dvwd environ¬ 
ment variable to specify Uml device. server_nume is the hostname of die 
file server running on the DataVault or VMEIO. 

# eotenv DVWD serverjiamei 

26. Run the hardware portion of dvteats. Use die -g argument to specify a 
geometry that will produce a data block size appropriate for die I/O device. 
For example, the recommended geometry values for a DataVault are: 

I /uar/diag/tsd/dvfcastS —h -g 64, 64 

This will produce 16-Kbytc blocks, which matches the DataVault block 
size. Smaller block sizes are typically used for VMEIO devices, the exact 
size depending on the storage characteristics of the device. 

27. If dvtestS fails, record the error messages generated by the tests and 
notify Thinking Machines product support — (617) 234-4000. 

If it does not fail, go to step 28. 

28. Repeat steps 24 through 27 for every DataVault and VMEIO device con¬ 
nected to the CM-5. 

29. When dvteatS has been run on all DataVauIis and VMEIO devices, run 
the hippi-ioop verifier for each CM-HIPPI connected to the CM-5. 
Change the dvwd environment variable to specify the CM-I1IPPI. 

# sotenv DVWD server name-. 

# /usr/diag/tad/hippi-loop 

30. If hippi-ioop fails, record the error messages generated by the tests and 
notify Thinking Machines product support — (617) 234-4000. 
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If ii docs not fail, go to step 31. 

31. Repeat steps 29 and 30 for each CM-HIPPI device. 

32. When all Data Vault, VMEIO, and CM-HIPPI devices have passed 
dvtests and hippi-ioop, the weekly preventive maintenance session 
is complete. 

Return the CM-5 and its I/O devices to regular use. To do this, stop and 
delete the system-wide partition created in step 19 and recreate and restart 
the partitions deleted in step 4. 
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Chapter 4 


System Startup and Shutdown 




This chapter describes the procedure for bringing a CM-5 from a powered down 
condition to the state where it is ready to run user programs. It shows how to 
create partitions and start the timesharing daemon running on them. It also ex¬ 
plains how to stop die timesharing daemon, delete partitions, and shut down the 
CM-5. 

These procedures arc presented in several levels of detail, from a high-level view 
of the general tasks to detailed descriptions of each step. 

* Figure 5 and Figure 6 identify the major tasks involved in powering a 
CM-5 system up and down, including partition creation and control. 

■ Figure 7 and Figure 8 present tire individual steps involved in each pow¬ 
er-up and power-down task in a quick-reference format. 

■ Sections 4.1 and 4.2 provide detailed descriptions of these procedures. 

The power-up procedure assumes that the CM-5 is completely installed (hard¬ 
ware and software), including all cabling. 
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Figure 5. CM-5 startup procedure. 
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CM-5 Startup Procedure — Quick Reference 

I * Boot up each external Control Processor 

2, If the system includes I/O, power up all I/O devices, 

3, Power up the CM-5 cabinct(s). In multiple-cabinet systems, power up the network cab¬ 
inet first. 

4, Log in to the system administration console as root, 

5* If the system's hardware configuration has been changed since the last boot session, 
update /etc/cm/configyration/hardware, install to reflect the changes. 

6, If the system's I/O configuration has been been modified since the system was last 
booted, update /ota/i*. e>on£ to reflect the changes. 

7, Set cmdiag_jPATH to specify the diagnostic library pathname and jtag_server to 
point to the diagnostic server. Set these environment variables on all Control Proces¬ 
sors. The defaulL CMDIAG_PATH is /usr/diag/emdiag. 

# seternr DIAG_PATH /usr/diag/cmdiag 

# ««tonv jtag_server diagserverhostname 

8, Create the desired partitions. This and subsequent steps may be implemented by a 
script. If not, run cn^artitlon creata for each partition. For example: 

I /uer/etc/cmpartition create -pm homer -pn_rang© 0-63 

I /u*r/etc/c3T^j*rtition create -pm mil ton -pn_range 64-127 

9, Run cmreset to reset die system hardware and cmreset -s on each partition 
manager to reset the partition manager's interface module, 

10, If the CM includes I/O, initialize each 10B A by running io_coldjboot. 

# /uar/etc/io_cold_boot 

11, Start each partition by running a separate empartition start on the associated 
partition managers. 

ft /uar/etc/cmpartitlon start -cmd ta-daemon 

# /uar/ate/empartition start «cind ts—daemon 
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Figure 7, CM-5 startup procedure. 
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CM-5 Shutdown Procedure — Quick Reference 

1. S top timesharing daemons on all partitions. Log in as root to each Parti¬ 
tion Manager and run empartition stop. 

# /usr/etc/cmpartition step 

2, Delete all partitions. This can be done from the system administration 
console, 

# /Tiar/atc/empartition dalsta -pm h^roar 

# /uar/ate/empartition dalats -pm miIton 

3* Halt and then power down all Contrcl Processors, 

4, Turn off CM-5 power supplies. 

5 t If I/O is included, halt the station manager of each I/O device and power 
down the device. 
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Figure 8. CM-5 shutdown procedure. 
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4.1 System Startup 

The startup procedure is organized into 11 steps. These steps are summarized in 
Figure 7 for quick reference. Background details for the various steps arc pres¬ 
ented in the balance of this section. 


4.1.1 Boot External Control Processors 

Power up any external Control Processors and verify that their boot sequence is 
successful. The location of the power switch will depend on which Sun model 
is used to implement the Control Processor. If you have any questions about this 
step, refer to the applicable Sun documentation. 


4.1.2 Power Up All Peripherals 

If the CM-5 system includes peripheral devices, such as Data Vault, CM-HIPPI, 
CM-IOPG, and/or other VMEIO devices, apply power to ihcir power supplies and 
boot up their station managers. 


4.1.3 Power Up the CM 

Each cabinet in the CM-5 system is equipped with its own set of power switches. 

On device cabinets, these switches arc located behind the louvered comer panel 
that covers the cabinet's power supplies. See Figure 9. The panel is held dosed 
by magnetic latches along the main face of the cabinet and is hinged on the cabi¬ 
net's end wall. To open the panel, briefly press against die latched face and then 
release; the panel should swing out away from the cabinet, exposing the power 
supply bay. Again see Figure 9. 

Network cabinets have their circuit breakers on the opposite side or the cabinet, 
as shown in Figure 10. To reach these switches, slide the covering panel to the 
right. 
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Figure 9. Access to device cabinet circuit breakers. 


October 9,1992 







36 CM-5 Field Service Guide—Preliminary 



Figure 10. Access to network cabinet circuit breakers. 
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NOTE: In systems with multiple cabinets, the network cabinet contains a master 
dock, to which all other clocks arc referenced. In such systems, power up the 
network cabinet first, to permit the master clock time to stabilize before the other 
cabinets come on line. 

Turn on network cabinet power in the following sequence. 

■ Main circuit breaker 

■ PDU (Power Distribution Unit) controller 

■ Contactor 

■ LEDs 

■ +5 V supplies (any order) 

■ +2 V supplies (any order) 

Apply power to the device cabinet power supplies in the following sequence. 
Figure 11 shows the locations of the referenced circuit breakers. Repeat this se¬ 
quence for all device cabinets in the system. 

■ Main circuit breaker — CB l 

• AC and DR/CN — CB3 and CB4 

■ +5 V supplies — CB5, CE7, CB9, CB11, CB13, CB15, CB17, and CB19 

" +2 V supplies —CB6, CBS, CB10, CB12. CBM, CB16, CB18, and CB20 

■ LEDs — CB21 and CB22 

When the processing nodes power up, their boot-mode sequence is indicated on 
the LED panel by a left-to-right “chase pattern.” After applying power to the CM 
cabinets, check their LED panels to verify that they have booted successfully. If 
this pattern is not displayed, check to see that the cabinet’s LED switch is in the 
"fast copy” mode. Table 1 summarizes the various LED mode settings for this 
switch. See Figure 9 for the location of the LED switch. 
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Figure 1L Device cabinet circuit breaker locations* 
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Table 1. LED Mode Switch Sellings. 


HEX 

VAjJJE 

MOPE 

DEFINITION 

0* 

Freeze LEDs 

Maintains current slate of LEDs, 

1 

Fast Copy 
(defauk) 

Copies PN values to LEDs; special 
boot-mode sequence generates 
icfHo-right chase pattern, 

3 

Interleaved Copy 

Displays PN values on LEDs; groups 
of four even-numbered rows shift their 
display to the left and odd-numbered 
groups of four rows shift to the right. 

5 

Random 

Generates random patterns. 

7 

Interleaved Random 

Same shifting behavior as mode 3 but 
with values supplied by random number 
generator. 

9 

LEDs On 

Turns all LEDs on. 

A 

LEDs Off 

Turns all LEDs off. 

B 

Blink 

Alternately turns all LEDs on and off; on 
for 0,5 second and off Tor 0.5 second. 

C 

Test 

Continuously runs powerup self tests; 
these tests are described in the CM-5 
Field Service Guide. 

D 

Display PM Loop 

Used in hardware diagnostic sessions to 
trace connectivity problems. 


* Swiich settings 0, 2. 4, 6, 8, E, and F all specify Freeze mode. 
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4.1.4 Log in to the System Administration Console 

Log in to the System Administration Console as root. 


4.1.5 Update hardware.install if Hardware Has Changed 

If any CM-5 hardware components have been installed, removed, or repositioned 
since the last time the system was powered up, update /etc/cm/configura¬ 
tion/ hardware .install to reflect the changes. One-to-one hardware re¬ 
placements do not require editing the file since the net change is zero. 

hardware.install is created at the factory to define the specific hardware 
configuration of a given system. So long as the system's hardware configuration 
remains unchanged, this file will require no attention. 

Appendix A describes the hardware . install contents. If you arc still uncer¬ 
tain about how to edit this file, please contact your Thinking Machines Corpora¬ 
tion representative for guidance. 


4.1.6 Update io.conf if I/O Bus Configuration Has Changed 

If the system’s I/O bus configuration has changed since the system was last pow¬ 
ered up, edit /etc /io.conf to incorporate those changes. If the change also 
involves adding, removing, or relocating any IOBA hardware, you will need to 
edit hardware.install as well. 

io.conf defines the bus attributes, such as station ID and bus arbiter status, of 
the I/O devices connected to the E-CMIO bus. 

Appendix B describes io. conf. If you are uncertain about how to edit this file, 
please contact your Thinking Machines Corporation representative for guidance. 


4.1.7 Set Environment Variables 

Two environment variables, cmdiag^path and jtag_server, must be set cor¬ 
rectly to ensure reliable partitioning behavior. 
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cmd i ag_path pathname This variable identifies the home directory of the 

JTAG library, which contains information needed 
by the patitioning software. The factory-set 
default is /usr/diag/emdiag. 

jtag_Server hostname This variable specifies the Control Processor on 

which the JTAG server is running, hostname 
must be the name of the System Administration 
Console. 


4.1.8 Check the Current Partitioning State 

If you arc at all uncertain about the current availability of Processing Nodes and 
Control Processors, use the empartition list -1 command to display this 
information. This step is entirely optional. 


4.1.9 Bringing up a System — Example 

Figure 12 provides a sample listing showing the steps involved in creating and 
activating partitions in a newly powered up system. The example shown in 
Figure 12 represents a system named calliope with 128 Processing Nodes. The 
system contains two Control Processors named homer and milion; homer is the 
System Administration Console. 

The first page of Figure 12 contains a summary of the various steps in the se¬ 
quence organized into seven sections. 
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Listing Summary 

go t ten Des c riptio n 

1 The environment variables ctoiagjpath and jtag^server must be sci 
to the appropriate values, The*at*nv commands arc being run from hom¬ 
er, the System Administration Console. 

2 ciapartition Hat shows the current state of partitioning in the CM. 
In this example, Lhc system caliope has 128 PNs and two parti lion manag¬ 
ers, homer and milton. There are no partitions in the current configuration, 
leaving all PNs available for new partitions. 

3 Next, two partitions are created, each containing 64 PNs, 

4 empartition list -1 is run again to verify dial the desired partitions 

were successfully created. 

5 Before the timesharing daemon can be started, the system hardware must 
be reset. In addition, the interface board in the partition manager must be 
reset; this is done by the -• flag. 

6 The timesharing daemon is started on the partition managed by homer. The 
-k argument to ts-dae*non tells which file contains the OS kernel that is 
to be downloaded. In this case, the default file is k«n*l, hw. 

7 Again, empartition list -1 is run to verify that the timesharing dae¬ 
mon is running. The response shows the partition managed by homer is 
ACTIVE while the other partition is still only ALLOCATED. 


Figure 12. Bringing up a system — listing example (page 1 of 3). 
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(login portion of listing has been omitted) 


1 

2 


3 

4 


5 

6 



: . . : :';v ;;; ;v:;:::::i 

homer# setenv CMDIAG PATH /tasr/diag/emdiag 
homer# eeteov JTAGJSERVER homer 


homer# empartition list -1 

CM system "calliope" 

123 Processors [16 Mbytes memory, SPftftC IU, SPARC FPC } 

. 

2 Partition Managers 
homer.think.com 

; • ; . . ••• : : . : : : . : . =: ’ V ■ : : ; ... := 

miItoh•think.com 

Available PN Ranees: 

n.irj 

0127 


homer# 
homer # 


start -cmd ts-daemon —K. kernel,hw 


(conumicd on next page) 


Figure 12. Bringing up a system — listing example (page 2 of 3). 
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(continued from previous page) 


homer# 

homer# repartition list -1 
CM System "calliope" 

123 Processors f 16 Mbytes memory, SPARC : IU, SPARC FPU ] 
2 Partition Managers 
bomer.think.com 
miIton.think * con 

e 4 N usr 9e£: 


.No IOP on This Machine 


Name Partition Manager 

>;.• *••• 

homer homer.think.com 
milton milton.think.com 


Size State . ; ISodea, 

64 ACTIVE. 0-63 

6 A ALLOCATED 6-5-127 


homer# rlogin «1 root milton 
Password: tW_password 

milt# setenv CMDIAG PATH /usr/diag/cmdiag 
— 

milt# setenv JTAG_SERVER homer 
miltfr onireset -s 

milt# /ilsr/etc/empartition start -cmd ts-daemon 

milt# exit 

homer# 


Figure 12. Bringing up a system — listing example (page 3 of 3). 
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4.1.10 Create User Partitions 

Create partitions with the empartition create command. Run this command 
on the System Administration Console as shown in Figure 12. 

To associate an IOBA with a partition, include the -iop option in the create 
argument list. The following example creates two 64-PN partitions, homer and 
milton, and associates an IOBA with each. One IOBA is at network address 131 
and the other is at address 195. In this example, homer serves as both system 
administration console and partition manager. 

homer# empartition create —pm homer -pn_rang© 0-63 -lop 131 
homer# empartition create -pm milton -pn_range 64-127 -iop 195 


4.1.11 Reset the CM and Individual PM Network Interfaces 

Run cmreset on the system administration console to execute a system-wide 
hardware reset. Then reset each partition manager’s network interface by run¬ 
ning cmreset -s on each partition manager. The following example shows two 
partition managers, homer and milton; homer also serves as the system admin¬ 
istration console. 

homer# /usr/diag/emrasat 
homer# /usr/diag/cmreset -s 
homert Elagin -1 root mil”on 
Password: rootjpassword 
milton# /u sr/diag/emresat —n 

enure set is required alter each power up cycle to synchronize system clocks and 
initialize all registers and switches to a known state, cmreset -s resets the 
network interface of the partition manager on which it is executed. This reset 
must be performed separately on each partition manager. For example, if your 
system has a partition manager, 


4.1.12 Initialize the I/O System 

If the CM includes I/O devices, coldbooi the CM-5 IOBA (Input/Output Bus 
Adapter) hardware. An IOBA is a set of circuit modules within the CM-5 that 
together form the interface to a CMlO bus. If the CM-5 has multiple CMIO buses, 
each is connected to a separate IOBA. 
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The command for coldbooting lOBAs is io_cold_boot, which downloads the 
I/O kernel to a processor in the IOBA. This must be done any time cmreset is 
executed. 

NOTE: io_coid_boot depends on configuration information that is generated 
by t s-daemon. Consequently, the Control Processor on which io_cold_boot 
is executed must have run the timesharing daemon at least once before the cold¬ 
booting operation can be performed. 1 3-daemon need not be running, however, 
at the time io_coid_boot is invoked. 

The I/O configuration file.io.conf, provides io_cold_boot with the address 
information needed to download lire kernel to each IOBA in the system. 

The syntax for using io_coid_boot is as follows. 

/Uflr/otc/io_c o1d_boot 

NOTE: io_cold_boot expects three files lo reside in the following locations. 

/etc/io.conf 

/us r/etc/io_lcernel * hw 

/usr/etc/io_download 

If these files arc stored elsewhere, the -I, -k, and -3 switches (respectively) 
must be given to io_cold_boot to point the program to die correct files. 


4.1.13 Activate Partitions 

Once a partition has been created and the resets described in Section 4.1.11 have 
been performed, the partition is ready to be activated. To activate a partition, run 
the empartition start command on the Partition Manager associated with 
that partition. 


NOTE 

empartition. start and empartit ion stop must be 
executed on the Control Processor that is assigned to manage the 
partition being started or stopped. 
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When empartition start has completed, the partition is ready for use. 
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4.2 System Shutdown 

The shutdown procedure is organized into 5 steps. These steps arc summarized 
in Figure 8 for quick reference. Background details for the various steps are pres* 
ented in the balance of this section. 


4.2.1 Stop All Timesharing Daemons 

Execute empartition stop on each partition to halt the timesharing daemon 
and put the partition into inactive status. The empartition stop subcom* 
mand must be executed separately on each partition manager for every partition 
you wish to deactivate. The stop subcommand syntax is: 

empartition stop 


4.2.2 Delete A!l Partitions 

This step is optional. After stopping a partition, you may deallocate that partition 
with the command empartition delete. If you do not do this, the partition 
is automatically reallocated upon restart of the system. The empartition de¬ 
lete subcommand syntax is: 

empartition delete { [-pm hostname ] 1 [-name partition name] } 

-pm hostname defaults to the hostname of the partition manager on which the 
delete command is executed. 

-name partitionname associates an optional name of your choice with the parti¬ 
tion. This argument can be used instead of -pm hostname to speci fy the partition. 
It has no default value. 


4.2,3 Shut Down External Control Processors 

Halt and then power down any external Control Processors. Tire locadon of the 
power switch will depend on which Sun model is used to implement the Control 
Processor. If you have any questions about this step, refer to the applicable Sun 
documentation. 
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4.2.4 Power Down the CM-5 

Turn off all network cabinet circuit breakers in the following sequence, 

* +2 V supplies 

■ +5 V supplies 

■ LEDs 

■ Contactor 

■ PDU controller 

■ Main circuit breaker 

Turn off all device cabinet power supplies in the following sequence. 

• +2 V supplies■— CB6, CBS, CB10, CB12, CBM, CB16, CB18, andCB20 

■ AC and DR/CN — CB3 and CB4 

« +5 V supplies — CBS, CB7.CB9, CBll, CB13, CBlS. CB17, and CB19 

■ LEDs — CB21 and CB22 

■ Main circuit breaker — CB1 


4.2.5 Shut Down All Peripherals 

ir the system includes I/O, halt the station manager of each I/O device and ihcn 
power down die device. 


October 9,1992 






Chapter 5 

CM Error Logging System 
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Whenever a hardware fault causes the timesharing daemon to exit, the CM error 
logging system records the event in /var/ieg/om-orrotB .log. On these oc¬ 
casions, 13-daemon passes information about the fault to the SunOS syslog 
system, which performs the actual logging, ays log's loca!7 error facility is re¬ 
served for these messages. 


5*1 Implementing CM Error Logging 

Ordinarily, CM-5 systems arc shipped from the factory with error logging im¬ 
plemented. If, for any reason, error logging must be enabled in the field, this is 
done by adding the following lines to the end of /etc /syslog. conf 

# Connection Machine Logging Facility 
local7.debug /var/log/cm-errors.log 

NOTE: These lines must appear just as shown here, including ihe comment on the 
first line. 

The first field identifies the error facility and the level of filtering to be applied 
to the error reporting. In this ease, the error facility is called local7 and the 
error severity level is debug. Selecting debug means all errors will be reported, 

The second field specifics the file in which locai7 errors will be logged. 


October9.1992 


51 







52 CM-5 Field Service Guide—Preliminary 


5.2 Error Message Description 

Error messages consisi of a single line containing 9 fields separated by vertical 
line characters, I . The 9 fields comprising each message are: 

timestamp host \ seconds \ general error category \ 
program name I subprogram name j userid | groupid | 
processid \ error message 

timestamp Contains a timestamp for the message and the hostname 
host of the CP on which die timesharing daemon is running. 

This field is always terminated with the term, sysiog: 

seconds Gives the time in seconds since January 1, 1970. 

general error Presents a high-level description of the type of error 
message being reported. 

program Identifies the timesharing daemon as the source of the 

name message. 

subprogram Identifies the user program that was running when the 

name error was detected. 

userid Identifies the owner of the program Lhat was running 

when the error was detected. 

groupid Identifies the group associated with llie program that 

was running when the error was detected. This field is 
not currently implemented. Its place in the message 
contains the default -1. 

processid Gives the processid of the program that was running 

when die error was detected. 

error message This is a text string that describes the error. 

The following sample message illustrates the kind of information to be found in 
Citi-^rrora * log, 

Jul 5 11:39:39 yeats sysiog: I 673728379 | Hardware in 
error state f Timesharing daemon j /user/prod/fliter.a 
1 1556 (kjr) [ ™!{<no group>) I 11837 .[ Fatallnterrupt : 
time sharing detected error on NI* 
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Investigating ts-daemon Failures 
with kpndbx 
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When the timesharing daemon fails, it can be productive to examine the contents 
of the PN registers, which will often identify which PN(s) caused ts-daemon to 
fail. 

This appendix explains how to extract this information with kpndbx, an exten¬ 
sion of the UNIX debugging faciliiy, dbx. The procedure for using kpndbx fol¬ 
lows. 

1. kpndbx needs certain hardware configuration information to function, 
namely the total number of PNs in the system as well as the range ofPNs 
you want it to examine. If you are not certain of these details, use 
citpart it ion list -l to display the needed information. 

% /usr/etc/cmpartition list -1 
Appendix M contains the empartition man page, 

2. Set the environment variable pn_kernel to point to Lhc operating sys¬ 
tem kernel. This usually resides in /us r/etc/kerne l.hw. 

% setenv PM_Kerhel /use /etc/kernel.hw 

3. Set the environment variable cmdiag_path to point to the cmdiag di¬ 
rectory. This resides in / us r/diag/ cmdiag. 

% setenv CMDIAG_PATH /usr/diag/cmdiag 
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4. Invoke kpndbx. When kpndbx responds by asking Tor the partition size, 
enter the total number of PNs in the system. In the following example, 
the system has a total of 256 PNs. 

% /usr/bin/kpndbx 
partition size? 256 

% 

5. Use the set $pniist [m:n] command to tell kpndbx which PNs to 
examine. [m:n] specifies the physical addresses of the first and last PNs 
in the range, respectively. In the following example, the range includes 
64 PNs, in which the first PN is 128 and the last PN is 191. 

% set $pnlist [128:191] 

6. Use die following commands io tell kpndbx to display a summary of 
the PN status. 

% set $page_size = 0 
% pnsummary all 

7. Examine the resulting summary. An example of this output is shown be¬ 
low with an explanation of its contents following. 

pn number 24: 

running, pc = Blcc, psr ■= 110010a2, tbr = fSOOOlBO 
CMNA interrupt_cause = 0 

The contents of this example are explained below, 

■ pn number is the relative address of the PN wiiliin the partition. 
The physical address of the PN in this example is 152 {128 + 24). 

■ The first entry of the second line indicates the general state of 
the PN at the time of failure. It will show either running or er¬ 
ror. 

■ The rest of the second line displays the contents, in hexadecimal, 
of the three key registers: pc, psr, and tbr. 

■ The third line identifies the level of the interrupt that caused the 
PN to terminate operation. In this case it was interrupt level 0. 

8. Keep the following in mind as you evaluate the set of register states that 
kpndbx displays across the range of PNs. 
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■ Ordinarily, the PN that caused ts-daemon to Tail will be in the 
error state (error will be displayed instead of running). 

» If no PN shows the error state, check the tbr values of all PNs, 
If all PNs except one have the same tbr value, Uic exception is 
probably the failing PN. 

■ If all PN have the same tbr value, check the pc and psr values, 
Usually, the pc values will be different, but close, across the 
range of PNs. The psr values will mostly be the same across the 
PN range, with two or three different from the rest. Look for a 
PN whose pc and par values arc significantly different; it is 
likely to be the cause of the failure. 

■ If Jtpndbx does net yield any of these clues, generate more error 
information with cmdiag. Appendix B explains how to use 
cmdiag for this purpose. 
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Appendix B 

Generating Error Information 


This appendix presents the most genera] procedure for troubleshooting CM-5 
hardware problems. Follow this procedure when you have too little information 
to focus attention on a particular area of the hardware. It is likely to generate 
useful error messages no matter which part in Lhc CM-5 is failing. 

The procedure is presented in two versions. One version is designed for trouble¬ 
shooting hardware within a partition without interfering with user applications 
running on other partitions. This procedure is described in B.l. If the tests pre¬ 
scribed in B.l are not sufficiently exhaustive, follow the full-system procedure 
contained in B.2. This procedure requires access to the complete system; no 
timesharing daemons can be running. 

Figure 13 provides a summary of die partition-contained procedure in quick-ref- 
crence format. Figure 14 provides the same quick-rcfcrcnec information for the 
full-system procedure. 


October 9.1992 


5? 







hom# /u*r/«tq/cmpartition list -1 
CM System ^Calliope" 

256 Processors { 6 Mbytes msmory, SPARC JU 
2 Partition Managers 
virgil.think-com 
mil ton. think, coin 
Available PN Ranges: 

All PNs in use 
IOP Addresses 
480 

: . .. .:J: : . : ; ■ i,, ' 

; _ ... : ■ :. ■' : / 

Name Partition Manager Sire State 

VI2 6 virgil.think.com 12 B ACTIVE 

M12 8 mil ton . think, com 128 ACTIVE 

.■ ■ . v . . ‘ 1 

horn# tXogin -1 root virgil,think,com 
password: QuiVive 
Virg# cmpartition stop 


Nodes 

0-127 

126-255 

480-480 




login: userjd 

% su 

. • • . 

password: rdptjtdssword 

$U hoitier.think.com /dev/console 

hom# cd /uer/diag/oxndiag 

. 

' : • • '■ - 

hom# aetenv cmdiag_FATH /usr/diag/emdiag 
hom# ««t«nv J1AG SERVER homer, think, com 


mm 




- Partition-Contained Diagnostics- 

Introductory Notes 

The system used to illustrate this procedure example has the following attributes. 

- System is named Calliope and has 256 PNs. 

- Diagnostic console is named homer. think. com. 

-Calliope has two 128-PN partitions, which are allocated to partition 
managers named virgil .think. com and milton .think. com. 

- Diagnostics will be run on virgil .think.com partition manager. 


(continued on next page) 


Figure 13. Generating diagnostic information on a partition — 1 of 2 
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Partition-Contained Diagnostics 


(continued from previous page) 






milton.thmk.com 




Name 
VI2 8 
M128 


Ixodes 

0-127 

128-255 

4S0-4S0 


Descri] 

Virgil 

rnilton 


diagnostic test report 




Figure 13. Generating diagnostic information on a partition — 2 of 2. 
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System-Wide Diagnostics 


Introductory Notes 

The system used to illustrate this procedure example has the following attributes. 

- System is named Calliope and has 256 PNs. 

- Diagnostic console is named homer. think. com. 

- Calliope has two 128-PN partitions, which arc allocated to partition 
managers named virgil. think. com and milton. think. com. 

-Diagnostics will be run on homer, think, com partition. 


1 


2 

3 


4 





login : user id 

% atl 

pa a sword : root password 

nr, t__ 




i:: 

l&l'V jl&V :'l 

' 




SU homer.think.com /dev/consolo 
hom# ed /uur/diag/cmdiag 

hom# eetenv CMDIAG_path /usr/diag/cmdiag 
hom# aetenv JTAG5EHVER homer. think, com 

hom# /u*r/*tc/cnipartition list -1 

CM System "Calliope" 

25 6 Processors [ 3 Mbytes memory, SPARC XU, SPARC FPU ] 

2 Partition Managers 
virgil-think*com 
milton.think * com 

Av “^ e PN Ranges: 

All PNs in use 

IOP Addresses 

480 

Name Partition Manager Sire State Nodes Description 

V128 virgil* think- com 12 8 ACTIVE 0“127 virgil 

Ml 28 milton. think, com 12 8 ALLOCATED 12H-2SS milton 

480 4 80 

• ■ • 

! ■ i . : :: \ ' . ' 

homd rlogin -1 root virgil.think.com 
password: QuiVive 
virg# ompartition step 
virg# cmpairtition delete 
virg# «iit 

hom# rlogin -1 milton.think.com 
password; QuiVaLa 
milt# empartition delete 
milt# exit 


(continued on next page) 


Figure 14, Generating diagnostic information on the full system — 1 of 2 
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homl cmpartition list -1 

CM System "Calliope" ■ 

256 Processors [ 8 Mbytes memory f 
2 Partition Managers 
virgil,think.com 
milton.think*com 
Available PN Ranges: 

All PNs in use 
IOP Addresses 
480 : 


Partition Manager ze'! : . State . Nodes 

rirgil.think.com 12 8 '! 0-127 ; ' virgi 1 

rtilton.think.com 128 128-255 ! milton 

480-480 




: 


Illl 




diagnostic test 


•. f': : 

■ 

... 

<CM-DlAG> q 

W /uar/atc, 

: ' -' .'c'i.c'. ■ - 
' : .•&. 

, i= ' ' , 


:■■■■■: ■:■. :■ ... :: 

..... . 


‘mm 




: : |£;| • ; 

...:: -:'o v ; ; . • 

lomer,think.com 

i dr combine .global 


: • j ; . 1 ; 




W ■ "i: : ' 


y± . , 


V . . ' 

..... 


|| ■ ■■ ■ ;;; 


System-Wide Diagnostics 


(continued from previous page) 


Figure 14. Generating diagnostic information on the full system — 2 of 2. 
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B.1 Running cmdiag within a Partition 

1. Log in to the diagnostic console as root and change directory to /usr/diag/ 
cmdiag, 

login: user_id 

% su 

password: root_password 

SU console _name /dev/console 

# cd /usr/diag/cmdiag 

2. Set the cmdiag_path and jtag_server environment variables. The de¬ 
fault CMDIAG_PATH is /usr/diag/cmdiag. Set the JTAG_SERVER variable 
to the diagnostic server hostname. 

# setenv CMDIAG_PATH /usr/diag/cmdiag 

# setenv jtag_server diag_server_hostname 

3. Run c&partition list -1 to be certain you have an accurate understand¬ 
ing of Ihc current partitioning status of the CM — what partition configura¬ 
tions are in effect, their names, the hostnames of Ihci r partition managers, and 
their state of use. 

# /usr/etc/empartition list —1 

4. If empartition list -1 reports the state of the target partition as active, 
it means ts-daemon is running on that partition. If so, riogin to the appro¬ 
priate partition manager and run empartition stop to hall the timesharing 
daemon. Then exit, 

# xlogin. -1 root pmjname 

password: root_password 

# /usr/etc/empartition stop 

# exit 

5. Run empartition Ii3t -l again. The target partition should now show 
an allocated status. This means the partition is defined and still associated 
with its partition manager, but the timesharing daemon is not running. 

6. Run the manufacturing version of the processor chip tests, followed by the 
Data Network and Contol Network verifiers. Use the -p pm name option to 
restrict these tests to the desired partition. Appendix M explains cmdiag op¬ 
tions in full. 

# ./cmdiag -p pm name 

<CM-DIAG> rgroups m pe global broadcast combine dr 
partition 
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NOTE: When running cmdiag as root, you must explicitly specify the local 
cmdiag search path (precede cmdiag command with , /). 

7. Analyze any error messages and, if the failure source is obvious, take appro¬ 
priate corrective action. If additional diagnostic information is needed, go to 
step 8. 

8. Run f ind-cm-erxor; the error system utility will report on any hardware 
failures it finds. 

<CM-DIAG> find-cm-error 

9. Analyze the find-cm-error output. The next step will depend on the nature 
of the error messages reported in steps 7 and 8. In most cases, you will pro¬ 
ceed along one of the following lines. 

* If a single component is identified as failing (most likely at the leaf 
node level), simply replace the field-replaceable unit on which it 
resides. Appendix C discusses this path in more detail. 

■ If a Control Network failure is reported, the source of the failure 
may be ambiguous. Appendix D explains how to parse Control Net¬ 
work error reports to isolate the fault to a single component or inter¬ 
connect path. 

■ Data Network error reports are less ambiguous than Control Net¬ 
work error messages, but do need special analysis. Appendix E ex¬ 
plains how to troubleshoot Data Network failures. 

■ If the failure symptoms indicate an I/O-related failure, use the 
cmdiag I/O tests that can be run within a partition to evaluate the 
I/O tests hardware more closely. Appendix F identifies which I/O 
tests are partition-contained. Appendix H describes the procedure 
for using these tests. 


B.2 Running cmdiag on the Full System 

1, Log in to the diagnostic console as root and change directory to /usr/diag/ 
cmdiag, 

login: userjd 
% su 
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password: root^password 
f console_name /dev/console 

# cd /usr/diag/emdiag 

2. Set the cmdiagjpath and jtag_server environment variables. The de¬ 
fault CMC I AG_P ATH is /usr/diag/cmdiag. Set the JTAG_SERVER variable 
to the diagnostic server hostname. 

# setenv CMDIAG_PATH /usr/diag/cmdiag 

# setenv <jtag_server diag_server_hostname 

NOTE: No user activity will be possible beginning with the next step. 

3. Stop and delete all panitions. To do this, you need to know the hostname of 
each partition manager to which a partition is allocated. If necessary, run 
empartition list -1 to get this information, 

# /usr/etc/empartition list -1 

4. Then run empartition stop and empartition delete on every parti¬ 
tion manager that has an active partition. Run empartition delete on 
every partition manager that has an allocated partition. 

For example, if empartition list shows virgil.think.com as 
active and milton.think,com as allocated, do the following. 

# r log in -1 root virgil .think, com 
password; rootpassword 

virg# /usr/etc/cmpartition stop 
virg# /usr/etc/cmpartition delete 
virg# exit 

# rlogin -1 root milton*think,com 
password: rootpassword 

milt# /usr/etc/cinpartition delete 
milt# exit 

# 

5. Run empartition list -l again. It should report no partitions either 
ACTIVE Or ALLOCATED* 

6. Run the manufacturing level of the JTAG test group, 

# , /cxndiag -C 
<CM-DIAG> rgroups m SHI 
<CM-DIAG> rgroups m CLKDN 
<CM-DIAG> rgroups m clkbof 
<CM-DIAG> rgroups m SPI 
<CM-DIAG> rgroups m FILLER 
<CM-DIAG> rgroups m PE PEMEM 
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<CM^DIAG> rgroups m CN 
<CM-DIAG> rgroups in DR 


NOTE: When running cmdiag as root, you must explicitly specify the local 
cmdiag search path (precede c m diag command with . /), 

7. Analyze any error messages and, if the failure source is obvious, take appro¬ 
priate corrective action. See step 11 for guidance. 

If this step does not provide sufficient information, go to step 8. 

8. Create a partition that encompasses all PNs in the system. Enter the lowest 
and highest PH addresses for first _pn and last_pn, respectively. 

<CM-DIAG> q 

# /uar/etc/cmpartifcion create -pn_range first_pn~tast_pn 

9. Execute a system reset and reset the partition manager’s interface module. 
Then run the processor chip tests, followed by the Data Network and Control 
Network verifiers. 

# cmreaet 

# cmreset -s 

# ./cmdiag -C -p pm_name 

<CH—DIAG> rgroups ra PH dr combine global broadcast 
partition 

10. Analyze any error messages and, if the failure source is obvious, take appro¬ 
priate corrective action. If additional diagnostic information is needed, run 
£ ind-cm-error again and go on to step 11. 

11. Analyze the £ ind-cm-error output. Die next step will depend on tire nature 
of the error messages reported in step 7. In most cases, you will proceed along 
one of the following lines. 

■ If a single component is identified as failing (most likely at the leaf 
node level), simply replace the field-replaceable unit on which it 
resides. Appendix C discusses this path in more detail, 

■ If a Control Network failure is reported, the source of the failure 
may be ambiguous. Appendix D explains how to parse Control Net¬ 
work error reports to isolate the fault to a single component or inter¬ 
connect path. 
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Data Network error reports are less ambiguous than Control Net¬ 
work error messages, but do need special analysis. Appendix E ex¬ 
plains how to troubleshoot Data Network failures. 

If the failure symptoms indicate an yo-relatcd failure, run the I/O 
tests described in Appendix F to evaluate the I/O hardware more 
closely. Appendix H describes the procedure for using these tests. 
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Failures at the Leaf Node Level 


C.1 Overview 

When error messages from kpndbx or f ind-cm-error point to a specific pn, 
corrective action is straightforward. The procedure in such cases is summarized 
below. 

1. Identify the PN module on which the faulty component resides and de¬ 
termine its location in the system (cabinet, backplane, slot). 

2. Replace the unit (see Section C.2 for board replacement procedure). 

3. Run the complete diagnostics test suite to verify that the system is now 
fully functional. Remember that the timesharing daemon must not be 
running in the partition. 

# ./cindiag -m -p pm_name 

4. If any errors are reported, go to Appendix B for further guidance. If no 
errors are reported, the system can be returned to regular service. 


C,2 PN Board Replacement Procedure 

(to be supplied) 


d 
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Appendix D 

Tracing Control Network Errors 


D.1 Introduction 

When a component or interconnect path in the Control Network fails, the failure 
usually propagates through the network. As a result, crcidiag reports that many 
CN nodes have failed—every node containing an erroneous checksum value. 

This appendix explains how to analyze f ind-em-error reports to quickly nar¬ 
row the search to no more than two nodes and their interconnecting signal path. 
Then, corrective action simply becomes a process of elimination among those 
three candidates. 

Isolating faults in the Control Network depends on having a clear understanding 
of how errors propagate through the network. The following concepts arc key to 
this understanding. 

■ Similar to message broadcasting—Error status messages prapagaLe 
through the Control Network in a manner analogous to standard message 
broadcasting. An important exception to this analogy is, however, that 
error messages begin at die point where the error is detected. Unlike or¬ 
dinary broadcasts, which, always begin at a leaf node, error messages can 
originate at any level in the network, 

■ Up errors and down errors—When an error is detected before it reach¬ 
es the partition’s root node, it is flagged as an up error. The CN node that 
detects the error generates an error status and forwards it upward to the 
root. The root node then broadcasts it downward to all nodes in the parti¬ 
tion. In such cases, one or more nodes will report up errors and every 
node in the partition will report a down error. Figure 15 shows an exam¬ 
ple of this. 
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* Down errors only—When an error is first detected in a downward path 
(no up error is detected), tire error status is broadcast to all nodes below 
the node that first detected lire error. In this case, no up errors are re¬ 
ported. and the root never sees the error. See Figure 16 for an example. 

Keep these concepts in mind as you go through the procedure presented in Sec¬ 
tion D.2. 


D.2 Fault Isolation Procedure 

NOTE: For this procedure, you will need readable copies of the CN cable assem¬ 
bly drawings for all levels of tire network in your system. You will also need a 
large surface (eg, table) on which to spread these drawings open. 

1. Lay out the CN cable assembly drawings in a location where you can 
also read the error system report. 

2. Examine the f ind-cm-error output, looking for up errors. If there are 
any up errors, perform the steps labeled UP ERRORS, beginning with 
step 3. If there are no up errors, go to step 13. 

3. UP errors — On the cable assembly drawings, locate all CN compo¬ 
nents that show up-error status in the £ind-cm-error report. Ignore 
components with only down errors. 

4. UP ERRORS — In the set or components reporting up errors, find the 
component that is at the lowest point in the network tree. When you 
identify this component, you will have narrowed the search to the fol¬ 
lowing elements. Figure 15 illustrates this. 

■ this component 

■ the set of components that are its immediate children 

■ the paths that connect the children to this parent 

NOTE: The component’s children are implicated because a faulty compo¬ 
nent will not necessarily generate its own error status. Therefore, the first 
component to report an error may actually be reflecting an error that 
originated in one of its children. 

The next step is to narrow the search further. 
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5. UP ERRORS — Analyze the error message for the component selected 
in step 4. This message will ordinarily tell you which of the internal 
nodes (nodes 0,1, and 2) detected the failure. It may also associate the 
failure with a particular child path connected to the node. 

This time you want to find the lowest level node that is reporting an up 
error status and, if possible, which child path (left or right) is associated 
with that error. When you have determined this, you will have narrowed 
your search to the following elements. Figure 17 illustrates this. 

■ this node 

■ the child node connected to this node by the path implicated in 
the report 

■ or the interconnecting path 

NOTE: If a particular path is not specified in the error report, you may 
need to involve both children (left and right) in the process of elimina¬ 
tion. This is shown in Figure 18, 

6. UP ERRORS — Identify the circuit board(s) that contain the parent and 
child nodes identified in step 5. If these nodes are on different boards, 
identify the interconnecting cable as well. 

NOTE: If the parent node identified in step 5 is node I, all suspect ele¬ 
ments reside within the same component: the primary node (node 1), 
both children (nodes 0 and 2), and the interconnecting paths. In this case, 
the next steps will involve only one circuit board. 

7. UP errors — Replace the circuit board containing the parent node and 
rerun the cmdiag test that exposed the error. 

8. UP ERRORS — If the test reports no errors, the board replacement may 
have corrected the problem. Verify by running cmdiag -£. 

If the system passes the complete manufacturing version of cmdiag, re¬ 
turn the system to regular operation. If errors arc reported, go to step 9. 

9. UP errors — Restore the original board containing the parent node to 
its slot. Then replace the circuit board containing the implicated child 
node and rerun the test that reported the error. 

NOTE: If f ind-cm-error does not implicate either child path (see Fig¬ 
ure 5), choose one child node to replace first. If the test continues to fail, 
restore the board just removed, replace the other child node board, and 
run the test again. 
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10. UP ERRORS — If the test reports no errors, replacing the child node 
board may have corrected the problem. Verify by ninning cmdiag -m 

If the system passes the complete manufacturing version of cmdiag, re¬ 
turn the system to regular operation. If errors arc reported, go to step 11. 

11. UP ERRORS — Examine the connectors on the cable identified in step 
5. If you find bent or damaged pins, repair if possible. If repair is not 
possible, replace the cable. In either case, run the cmdiag test that re¬ 
ported the error when you finish working on the cable. 

NOTE: If find-cm-error does not implicate either child path (Figure 
18), examine both. Then repair/replacc them one at a time, running the 
failing test in between. 

12. UP ERRORS —If no errors arc reported, rcpairing/rcplacing the cable 
may have corrected the problem. Verify by running cmdiag -f. 

If the system passes the complete manufacturing version of cmdiag, re¬ 
turn the system to regular operation. If errors arc reported, call Cam¬ 
bridge for assistance. 

13. DOWN ERRORS ONLY—On the CN cable assembly drawings, locate the 
components that show down error status in the f ind-cm-error report 

14. down errors ONLY — In the set of components identified in step 13, 
find the component that is at the highest point in the network tree. When 
you identify this component, you will have narrowed the search to the 
following elements. Figure 16 illustrates this. 

» this component 

• the set of components that arc its immediate parents 

■ the paths that connect the parents to this child 

NOTE:The component’s parents are implicated because a faulty compo¬ 
nent will not necessarily generate its own error status. Therefore, the first 
component to report an error may actually be reflecting an error that 
originated in one of its parents. 

The next step is to narrow the search further. 

35. DOWN ERRORS ONLY — Analyze the error message for the component 
selected in step 14. This message will ordinarily tell you which of the 
internal nodes (nodes 0,1, and 2) detected the failure. It may also associ¬ 
ate the failure with a particular parent path connected to this node. 
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This time you want to find the highest level node that is reporting a down 
error status and, if possible, which parent path (left or right) is associated 
with that error. When you have determined this, you will have narrowed 
your search to the following elements. Figure 19 illustrates this. 

■ this node 

• the parent node connected to this node by the path impli cated in 
the report 

■ or the interconnecting path itself 

NOTE: If a particular path is not specified in the error report, you may 
need to involve both parents (left and right) in the process of elimination. 
This is shown in Figure 20. 

16. DOWN ERRORS ONLY — Identify the circuit board(s) that contain the 
parent and child nodes identified in step 15. If these nodes are on differ¬ 
ent boards, identify the interconnecting cable as well. 

NOTE: If die child node identified in step 15 is either node 0 or node 2. 
all suspect elements reside within the same component: the primary node 
(node 0 or 2), and its parent (node 1), and the interconnecting path. In 
this case, the next step will involve only one circuit board. 

17. DOWN ERRORS ONLY — Replace the circuit board containing the child 
node and rerun the test that detected the error, 

18. DOWN ERRORS ONLY—If the test reports no errors, replacing the board 
may have corrected the problem. Verify by running cmdiag ~£. 

If the system passes the complete manufacturing version of cmdiag, re¬ 
turn the system to regular operation. If errors are reported, go to step 19. 

19. DOWN ERRORS — Restore the original board containing the child node 
to its slot. Then replace the circuit board containing the implicated par¬ 
ent node and rerun the test that reported the error. 

NOTE: If find—cm-erxoE docs not implicate either parent path (sec 
Figure 20), choose one parent node to replace first. If the test continues 
to fail, restore the board just removed, replace the other parent node 
board, and run the test again. 

20. DOWN ERRORS ONLY — If the test reports no errors, replacing the board 
may have corrected the problem. Verify by running cmdiag -f. 
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If the system passes the complete manufacturing version of cmdiag, re¬ 
turn the system to regular operation. If errors arc reported, go to step 21. 

21 . DOWN ERRORS ONLY — Examine (he connectors on the cable identified 
in step 14. If you find bent or damaged pins, repair if possible. If repair 
is not possible, replace the cable. In either case, ran the cmdiag test that 
reported the error when you finish working on the cable. 

NOTE: If find-cm—errer does not implicate either parent path (see 
Figure 20), examine both. Then repair/replace them one at a time, run¬ 
ning the failing test in between. 

22. DOWN ERRORS ONLY — If no errors are reported, tire cable may have 
been the source of the problem. Verify by running cmdiag -f. 

If the system passes llie complete manufacturing version of cradlag, re¬ 
turn the system to regular operation. If errors are reported, call Cam¬ 
bridge for assistance. 
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Figure 15. Component-level analysis up and down errors 
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Figure 16* CompanenL’-Icvel analysis — down errors only 
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Figure 19, Node-level analysis — down errors only 
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Tracing Data Network Errors 




When a component or interconnect path in the Data Network fails, the failure 
will propagate through the network along a single path. The result is a set of Data 
Network nodes all reporting error status. 

This appendix explains how to use find-cm-error to isolate Data Network 
faults to no more than two components and their interconnecting signal path. At 
that point, you can identify and replace the faulty part through a process of elimi¬ 
nation. 

The Data Network troubleshooting procedure is described below. 

NOTE: For this procedure, you will need legible copies of the Data Network cable 
assembly drawings for all levels of the network in your system. You will also 
need a large surface (eg, table) on which to spread these drawings open. 

NOTE: This procedure is based on the assumption that you have already run 
cxndiag tests and have executed £ ind-cm-error. 

1. Lay out the network cable assembly drawings in a location where you 
can also read the error system report. 

2. Examine the f ind-cm-error output, looking for the network node re¬ 
porting a FATAL error status. 

NOTE: The first DR component to detect the error will record a unique 
error status, called a FATAL error. All subsequent DR components in the 
error message path will report SOFT errors. 

3. Locate the fatal-error component on the Data Network cable assembly 
drawings. Ignore all soft-error components. Note which parent or child 
port ts implicated in the fatal error. Figure 21 shows an example of thi s. 
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At this point, you will have narrowed the search to the following ele¬ 
ments, Figure 22 illustrates the node-level view of these elements, 

■ first suspect — (he fatal-error component 

■ second suspect — the component at the other end of the parent 
or child path associated with the fatal error 

■ third suspect — the implicated path itself 

NOTE:The component at the other end of the implicated path is suspect 
because a failing component will not necessarily generate its own fatal 
error status. Therefore, the fatal-error component may actually reflect an 
error originating in a parent or child connected to it. 

A. Identify the circuit board(s) that contain the components identified in 
step 3. If these components are on different backplanes, identify the in¬ 
terconnecting cable that was implicated in step 3 as well. 

5. Replace the circuit board containing the fatal-error component and rerun 
the cmdiag test that exposed the error. 

6. If cmdiag reports no errors, the board replacement may have corrected 
the problem. Verify by running cmdiag -£. 

If the system passes the complete manufacturing version of cmdiag, re¬ 
turn the system to regular operation. If errors are reported, go to step 7. 

7. Restore the original board containing the fatal-error component to its 
slot. Then replace the board containing the second suspect (the parent or 
child component) and rerun the test that reported the error. 

8. If cmdiag reports no errors, the board replacement may have corrected 
the problem. Verify by running cmdiag -£. 

If the system passes cmdiag -f, return the system to regular operation. 
If errors are reported, go to step 9. 

9. Examine the connectors on the cable identified in step 3, If you find bent 
or damaged pins, repair if possible. If repair is not possible, replace the 
cable. In either case, rerun the test that rcponcd the error when you finish 
working on the cable. 

If no errors are reported, return the system to regular operation. If errors 
are reported, call Cambridge for assistance. 
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Figure 22. Dam Network node-level analysis 
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I/O Diagnostic Tools 



F.1 Introduction 

This Appendix describes the diagnostic tools that the CM provides for trouble¬ 
shooting I/O problems. Table 2 identifies the I/O test programs and summarizes 
the diagnostic coverage they provide, Figure 23 shows where the various diag¬ 
nostic programs execute. 

These I/O test programs are described more fully in the following sections, 

* Section F.2, IOBA Internal Tests 

a Section F.3, CM-Based Verifiers 
" Section F.4, Data vault Internal Tests 

■ Section F.6, CM-lOPG Internal Tests and Verifiers 

■ Section F.7, CM-HIPPl internal Tests and Verifiers 

Appendix H explains how to use these diagnostic tools in typical I/O trouble¬ 
shooting activities. 

NOTE: SDA diagnostics are described in the SDA Field Service Guide. 
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Table 2. CM-5 1/0 Diagnostic Tools. 


Test Name 
(by category) 

Execution 

Environment 

Summary 

ioba Internal diagnostics 


■ 

XOCLK, IODR, IOGHTRL 

/u e r / d lag/ cmdi ag 

JTAG scan tests that check signa] 

XQBUF, I0CHNL 

Execute on CM-5 
diagnostic server* 

paths on the IOBA boards; each test 
name indicates the board it covers. 

IOF-CNTRL* XOF-CHNL* 

/uer/di&g/cmdiag 

Set of functional tests that exercise 

IOF-BUF, XOP-SYS 

Execute on CM-5 
diagnostic server* 

specific sections of IOBA hardware. 

CM-based Verifiers 

*xecut^-all~iopdv-tftfitfl 

/u s r /d j. ag / cmdi ag 
Execute on CM-5 
diagnostic server* 

IOBA writes test data to die Data- 
Vault and reads it back; this program 
verifies cabling between the CM *5 
and the Data Vault* 

oxccute-all-iopps^test b 

/uer/diag/cmdiag 
Execute on CM-5 
diagnostic server* 

PNs write test data to IOBA buffers 
and read it back; this program veri¬ 
fies data and control paths between 
the IOBA and the Data Vault. 

te 6t-ciaio-d*viee-d*ta-xf er 

/mg r/diag/cadiag 

Execute on CM-5 
diagnostic server. 

PNs write test data to all I/O devices 
listed in the io. conf file and read 
it back; this program verifies all 
relevant control and data paths in the 
I/O subsystem; you can select indi¬ 
vidual tests to focus attention on 
specific sections of hardware. 

dvt®®t5'vu 

/usr/local/etc 

Emulates VQ -intensive user applica- 

dvtostS-Bparc 

Execute on CM-5 
diagnostic server* 

dans; functions include opening files 
and directories* selecting an SDA or 
IOBA I/O Processor, writing test data 
from PNs to target DataVaults or 
CM-IOPGs and reading it back. 
dvteetS-vu requires PNs with 
vector units. dvteetS-spare can 
be run on systems with or without 
vector units* 

hippi-loopS 

/usr/loeal/eto 
Execute on CM-5 
diagnostic server, 

Resembles dvtestS-spare* except 
it reads and writes a CM-HIPPI. 
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Table 3. Table 2. CM-5 I/O Diagnostic Tools 
(continued) 

Test Name 
(by category) 

Execution 

Environment 

Summary 

DataVauIt Internal diagnostics 

dvdiag 

/uer/loGal/at c/dlag 

Execute on DataVauIt 
diagnostic server* 

A diagnostic package [hat tests the 
functionality of all internal Data¬ 
VauIt hardware* It includes a loop- 
back test that checks the DataVauIt's 
CM 10 bus interface 

CM-IOPG internal diagnostics 
viodiag 

/usr/local/et a 

Execute on CM-IOPG 
diagnostic server. 

A diagnostic program that tests the 
CM-lQPG's internal hardware. 

CMdOPG verifiers 

aerial 

TapeD 1 Vxfflrvfr 

/uer/local/atc 

Execute on CM-IOPG 
diagnostic server* 

/use/ local/etc 
Execute an CM-IOPG 
diagnostic server. 

Writes lest data from the CM IOPG 
to a DataVauIt and reads it back; this 
program verifies the data and control 
paths {including CMIO bus) between 
die CM-IOPC and the DataVauIt* 
Writes test data from the CM-TUD 
to a DataVauIt and reads it back; this 
program verifies the data and control 
paths {including CMIO bus) between 
die tape system and die DataVauIt* 

CM-HIPPI Internal diagnostics 
iopdiag, dutdiag* 
srcdiag, ayadiag, 

/uflr/lceal/etc/dlag 
Execute on CM-HIPPI 
diagnostic server. 

Set of functional tests that diagnose 
specific sections of CM-HIPPI hard¬ 
ware. 
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Figure 23* CM -5 I/O test programs — platform and coverage summary* 
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F.2 IOBA Internal Diagnostics 

cmdiag contains nine test groups that specifically target IOBA functionality. 
These test groups are listed below and described in Sections F.2,1 through F.2.9. 
The first five are JTAG tests; the other four are functional tests that exercise spe¬ 
cific sections of IOBA hardware. 


IOCLK 

JTAG 

Section F.2.1 

IODR 

JTAG 

Section F.2.2 

IOCNTRL 

JTAG 

Section F.2.3 

IOBUF 

JTAG 

Section F.2.4 

IOCHNL 

JTAG 

Section F.2.5 

IOP-CNTRL 

Functional 

Section F.2.6 

10P-BUF 

Functional 

Section F.2.7 

IOP-CHNL 

Functional 

Section F.2.8 

IOP-SYS 

Functional 

Section F.2.9 


NOTE: In addition to these nine I/O-specific test groups, the Data Network verifi¬ 
er, dr, provides some coverage of I/O functionality as well. It tests the lOBA’s 
IOCNTRL board as if it were a Processing Node attached to the network. 

Test groups focus on particular boards or subsystems in the IOBA; the test names 
indicate their respective coverage. Figure 24 illustrates the IOBA board configu¬ 
ration in block diagram form. 

Invoke these tests in the cmdiag shell using the rgroups command with m 
(manufacturing version) flag. For example: 

<CM-DIAG> rgroups m I OP-SYS 

runs verifier tests on the IOSYS board. 
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Figure 24. IOBA basic block diagram. 
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F.2.1 lOCLK(JTAG) 

This group runs boundary scan tests on all JTAG-accessible IOCLK board compo¬ 
nents. 

F.2.2 IODR (JTAG) 

This group runs boundary scan tests on all JTAG-acccssiblc IODR board compo¬ 
nents. If the IOBA contains multiple IODR boards, all arc tested in parallel. 

F.2.3 IOCNTRL (JTAG) 

This group runs boundary scan tests on all JTAG-accessible IOCNTRL board 
components. 

F.2.4 10BUF (JTAG) 

This group runs boundary scan tests on all JTAG-accessible IOBUF board compo¬ 
nents. If an IOBA contains multiple IOBUF boards, all are tested in parallel. 

F.2.5 10CHNL (JTAG) 

This group runs boundary scan tests on all JTAG-acccssiblc IOCHNL board com¬ 
ponents. If an IOBA contains multiple IOBUF boards, all arc tested in parallel. 

F.2.6 IOP-CNTRL (Functional) 

This group runs a set of functional tests on the IOBA Control board (IOCNTRL). 
Its primary diagnostic focus is X3US operations and related functionality. 

NOTE: Run the cmdiag pe test group first to verify Processing Node functional¬ 
ity before running iop-cntel, 
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F.2.7 IOP-BUF (Functional) 

This group tests the fuetionality of the IOB A Buffer board (IOBUF). The tests will 
begin with the first IOBUF board they encounter and then step through all subse¬ 
quent IOBUF boards in the IOB A. The tests will automatically repeat until all 
IOBUF boards in the IOBA have been tested. 

In a system with multiple lOBAs, these tests will be run on all lOBAs in parallel. 
If the lOBAs have different numbers of IOBUF boards, the tests will repeat until 
all IOBUF boards in the largest set have been tested. Consequently, some IOBUF 
boards in the smaller sets will experience redundant testing as iop-buf wraps 
around to the first IOBUF board in the set. 


F.2.8 10P-CHNL (Functional) 

This group tests the functionality of the IOBA Channel board. 

These tests require the environment variable I0P_STATI0N_ID to be set before 
they are run. This variable takes the form ioeXJCT_station_id, where XXX is the 
physical network address of the IOP (in decimal). 

If this variable is not already set at the time you invoke iqp-chnl, you will be 
asked to supply it. If there are mulLiple IOPs, you will be prompted for each unde¬ 
fined IOP station ID, The following example illustrates the dialog for a system 
with two IOPs at address locations 480 and 490. 

<CM-DIAG> rgroup3 m IOP-CHNL 

IOPXXX_STAXION_ID? 480 

IOP XXX STATION ID? 490 


; tests execute 


<CM-DIAG> 


NOTE: If you do not know the IOP address(es), run empartitiem list -1. 
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F.2.9 IOP-SYS (Functional) 

This test group focuses primarily on PBUS operations. Related functionality on 
other IOBA boards is also tested as a by-product of the PBUS exercises. 

These tests require the environment variable iopXXX_station_id to be set be¬ 
fore they arc run. See section F.2,8, IOP-CHNL for details. 


F.3 CM-Based Verifiers 

This section describes several system-level verifiers that arc executed on the 
CM’s diagnostic server or from a partition running a timesharing daemon. 

The main service provided by each of these the verifiers is to transfer test data 
to and from a peripheral device, thereby exercising most or all of the functional¬ 
ity that is needed by user I/O. Data patterns are varied to locate bit-sensitive faults 
more readily. 

The choice of which verifiers to use can be influenced by a number of consider¬ 
ations, including: the type of pcripcral involved, the tradeoff between test rigor 
and time available to test, and personal preference. A brief summary of each 
verifier is provided below. Additional detail is presented in Sections F.3.1 
through F.3.3. 

■ Focused CM-to-DataVault Verifiers: This consists of two complementary 
verifiers, each of which exercises a separate segment of die CM-to-Data- 
Vault path. One transfers test data between the IOBA and the Data Vault; 
the other transfers test data between the PNs and the IOBA. The second 
verifier also writes test data from the PNs out to the Data Vault and reads 
it back to check the I/O data and control paths across their full length. 
These verifiers are described in Section F.3.1. 

■ End-to-End Tests: This refers to a package of three independent verifiers, 
each tailored for a different type of peripheral: for a Data Vault, CM-IOPG, 
orCM-HIPPL The user interface provides considerable control over details 
of the test data transfers. All transfers are between the PNs and the ap¬ 
propriate peripheral device These are described in Section F.3.2. 

■ dvtestS-vu, dvtest5’sparc: This program performs a comprehen¬ 
sive emulation of an I/0-in:ensive user application, including verification 
of the file system software. It can target either a Data Vault or CM-IOPG 
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as its peripheral device. dvtestS-vu and vteats-aparc are described 
in Section F.3.3. 

■ hippi-ioops: This verifier is equivalent to dvtestS-sparc, except it 
targets CM-HIPPIs. It is described in Section F.3.4. 


F.3.1 Focused CM-to-DataVault Verifiers 

cmdiag includes two complementary test programs that transfer blocks of test 
data between the CM and a DataVault. Each program focuses on a different sec¬ 
tion of the PN-to-DataVault I/O path. 

■ e^ecute-aii-iopdv-tests — This program writes various data pat¬ 
terns from the IOBA to the DataVault, reads the data back, and compares 
the read data with the data that was sent. 

* execute-all-ioppe-tests —This program has two phases. First, it 
writes test data from the PNs to the IOBA buffers and reads it back, verify¬ 
ing that segment of the I/O path. Then, the PNs write test data all the way 
out to the DataVault and read it back, verifying the I/O path’s full length. 

If this test is run on a partition, it uses all PNs in its partition. If it is run 
on the entire system, it uses all PNs found by autosizing 

NOTE: The PNs used by exeeute-all-ioppe-tests must be logically 
contiguous (within a continuous address range), and the number of PNs 
must be a power of two. 

Three environment variables must be set before either test can be run. These are 
described below. 

■ iopxxx_statioh_id specifies the station ID of the IOP to be used in the 
test. XXX is the lOP’s physical network address (in decimal). If this variable 
is not already set, you will be prompted for it. If you are uncertain about 
the system’s IOP addresses, run empartition list -l. 

■ iopxxx_dv_staht_elock specifies the starting block address to which 
the IOBA will write test data patterns. Choose any number from 1 to 960. 
If this value is not already set, you will be prompted to supply it. 

NOTE: The maximum start address (960) is determined by the size of the 
region on the DataVault that is reserved for diagnostic use and by the larg¬ 
est number of blocks written by these tests. The Data Vault’s diagnostic 
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zone measures 1024 blocks, and the largest number of blocks written by 
these tests is 64 — consequently. 1023 - 63 = 960. 

■ ropxxx_DV_STATiOM_io must match the station ID of the Data Vault port 
that will be used by the test. If you do not know the DataVault’s station 
ID, open /etc/io. conf, which describes the CM’s I/O configuration. 
Appendix J explains how to interpret the contents of /etc/io.conf. 

Both tests require the DataVault port that will be used in the test to be configured 
for command channel operation. To invoke command channel mode, run 
dvcoldboot +cn on the DataVault you plan to test. Enter +e0 or +ci to speci¬ 
fy the appropriate DataVault port. 

When you are done with these tests, run dvcoldboot -c/t (with n = 0 or 1) to 
restore the port to the data channel mode. 

NOTE: Although the DataVault firmware controlling the port ordinarily switches 
automatically to data channel mode as needed, it is advisable to explicitly turn 
command channel mode offbeforc using the DataVault port for any other opera¬ 
tions. 


F.3.2 End-to-End Tests 

cmdiag includes three groups of I/O verifiers, each of which targets a different 
type of peripheral device. 

The tests within each group differ from one another in the data patterns they use 
and/or in the specific hardware modules they target. Currently the three groups 
of end-to-end tests support DataVault, CM-HIPPI, and VMEIO peripherals (e.g., 
CM-IOPG). Figure 26 through Figure 27 describe the various tests contained in 
each device-specific group. 

NOTE: A diagnostic server must be running on the I/O device’s host computer. 
The procedure for starting an I/O device’s diagnostic server can be found in the 
I/O device’s installation and service manual. 

These tests report errors to the screen and to diag-e rror-log Jiostname in the 
local directory. 
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F.3.2.1 Alternative Approaches to Using End-to-End Tests 

Invoke end-lo-end tests within the cindiag shell in either of two ways; 

■ Invoke a single, device-specific test group to verify the functionality of a 
particular peripheral device and its VO path. This is the approach you are 
most likely to use when troubleshooting suspected I/O hardware faults. 
Figure 25 through Figure 27 provide detailed descriptions of these device- 
specific test groups. Section F.3,2.2 explains how to use them. 

■ Alternatively, you can use a single, high-level command, test-cmio- 
device-data-xfer, to automatically execute all test groups that are ap¬ 
propriate for your I/O configuration. This can be a convenient method for 
verifying ah devices on a multidrop bus or a complete I/O system during 
a major scheduled maintenance session. See Section F.3.2,3 for details. 

In either case, the end-to-end tests must be run on a partition that has an IOBA 
associated with it — that is, the partition must have been created with the -iop 
address option. 
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-— DataVauit Test Group--—- 

cm5-write-datavauIt 

Writes data from CM PNs to DataVauit. 

Syntax: cm5-writ a— datavault pattern block_flize speed 
For example, 

cmS-write-datavault ffffffff 4 2 
writes 4 blocks of data in pattern Gxffffffff in speed 2 to a DataVauit, 


chl 5 - rea d-da t a vau 11 

Reads data from DataVauit 

Syntax; cmB-read-datavault pattern block__*ize speed 
For example, 

cm5-read-data vault 242 

reads 4 blocks of data in speed 2 from a DataVauit, 


cmio-dv-iope-xfer 

Transfers data from CM PN T s to a DataVauit. 

Syntax: cmio-dv-iopa-xf*r pattern block_size speed 
For example, 

cmio-dv-iope-xfer ffffffff 4 2 

transfers 4 blocks of data in pattern QxfffiTCff in speed 2 to a DataVauit. 


cmio-dv-iope-a11-pattern-xfer 

Transfers all the data patterns GxOOOOOOOO, OxfffffffT, Oxaaaaaaaa, 0x55555555, 
and Gx37cS37c8 from CM PNs to a Data Vault. 

Syntax: cmio-dv-iopa-all-pattem-xfer block_»ize speed 

For example, 

cmio-dv-iope-all-pattexn^xfer 4 2 
transfers 4 blocks of data in patterns QxGOOOOOOO, Oxffffffff, Oxaaaaaaaa, 0x55555555, 
and 0x37c837c8 in speed 2 to a DataVauit. 


Figure 25, DataVauit test group — page 1 of 1 
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ABHHEVXATI OHS USED; 

HIFF X top = CM-MPFI interface to the CMIO bus 
Srt « CM-mPPiStation Manager 
SRC = CM-HIPPI's Source module. 

. DEST -CM-IUPPI's Destination module. 


cm5-read-hippi 

Reads data from CM-HIPPL 

Syntax: cm5-read-hippi test blook — wisa speed 

toftt = 0. l t 2, 3, or 4 indicate different CM-HIPPI paths: 

0 SM -4 H1PP1JGP -4 CM-5 
2 SM -4 DEST -4 HIPP1 JOP -4 CM-5 

For example* 

cmS-read’-hippi 2 4 2 

reads 4 blocks of data in speed 2 from a CM-HIPPL The data path is 

HIPPI_Staiion_manager —> [UPPI_destination —> HIPPIJOP -4CM4 


cm5 -w rit*-hIppi 

Writes data from PNs to CM-HIPPL 

Syntax: cmS-write-hippi teat pattern block — aisea speed 

teat - 0, 1. 2, 3* o: 4 indicate different CM-HIPPI paLhs: 

0 CM-5 “4 HIPPIJOP -4 SM 

1 CM-5 -4 HIPPIJOP -4 SRC -4 SM 

3 CM-5 -4 HIPPIJOP -4 SRC -4 DEST -4 SM 

For example. 

cm5-write-hlppi 0 ffffffff 4 2 

writes 4 blocks of data in pattern OxfTffffif at speed 2 to a CM-HIPPL The data path is 
CM-5 -4 HIPPIJOP —* inPPLSiadonJViariager 


- HIPPI Test Group — 


Figure 26. HirPI lest group •— page 1 of 3 
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HIPPI Test Group 
(continued) 


cmiohippi~d*Lta-piittei:n-"Xfer 

Transfers data from PNs to CM-HIPPL 

Syntax: end. ohippi-data-pa11 om-xf m r teat pattern block__aize 
speed 

teat = 0, l t 2 t 3, or 4 indicate different CMdffPPI paths: 

0 CM-5 -4 fflPPIJOP -*4 SM -4 HIPPIJOP -4 CM-5 

1 CM-5 -4 HIPPIJOP -4 SRC -4 SM -4 HIPPIJOP -4 CM-5 

2 CM-5 -4 HIPPIJOP -4 SRC -4 SM -> DEST -4 HIPPIJOP -4 CM-5 

3 CM-5 -4 HIPPIJOP -4 SRC -4 DEST “4 SM -4 HIPPIJOP -4 CM-5 

4 CM-5 -4 HIPPIJOP -4 SM -4 DEST -4 HIPPIJOP -4 CM-5 

For example, 

cmiohlppi-data^pattem-xfer 2 ffffffff 4 2 

transfers 4 blocks of data in pattern Ox ffffffff at speed 2 to CM-HIPPI* The data path is: 
CM-5 “4 HIPPIJOP -4 HIPPI_Souree -4 HIPPI_Station_Managcr -4 
HIPPLDestmation -4 HIPPIJOP -4 CM-5 


ciaioh ippi—emi o—da t a-x f a r 

Transfers all the data patterns OxODOOOOOO, Ox ffffffff, Gxaaaaaaaa, 0x555555555, 
and -x37cS37c8 from CM PNs to CM-HIPPL 

Syntax: cmiohippi-cmlo-data-xfei: blocJc_size speed 

For example, 

cmiohippi-cniiQ-data^xfM 0 4 2 

transfers 4 blocks of data in patterns 0x00000000, Oxffffffff, Gxaaaaaaaa, 0x55555555, 
and 0x37cS37c8 in speed 2 to a CM-1IIPFI. 


Figure 26* HIPPI test group — page 2 of 3 
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cmi ohippi-cmi o- da t a -a 11 -pat h-xf a r 

Transfers all the data patterns 0x00000000, OxffHffir, Oxaaaaaaaa, 0x55555555, 
and Gx37c837c8 from CM PNs to CM-HIPPL Tests all different CM-IUPPI paths. 

Syntax: endoMppi“cniia-<iata-all-path-x£*r block^aias* speed 

For example, 

crrdohippi-cndo-data-xfar 0 4 2 
transfers 4 blocks of data in patterns 0x00000000, Oxffffffff, Qxaaaanaaa, 
0x55555555, and Qx37c837t£ in speed 2 lo a CM-HIPPL The test paths arc: 

(0) CM 5 -4 HIPPIJOP -> SM -4 HIPPIJOP CM-5 

(1) CM-5 -4 HIPPIJOP -4 SRC “4 SM —4 HIPPIJOP -4 CM-5 

(2) CM-5 -* HIPPIJOP -> SRC —4 SM —4 DEST -> HIPPIJOP -4 CM-5 

(3} CM-5 -* HIPPIJOP —> SRC —4 DEST —> SM —> HIPPIJOP -> CM-5 

(4) CM-5 -4 IUPPJJOP —> SM —> DEST -J HIPPIJOP -4 CM-5 



Figure 26* HIPPI test group — page 3 of 3 
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VMEIG Test Group 


cm5-vrita-vmaio 

Writes data from CM PNs to VME3G, 

Syntax: cmS-wcitt-vinaio pattern mode ram-mode blocfc_aize 
speed 

mode m (master) or a (slave), 

rant-mode m (memory), £ (Fifo) or b (bypass) 

For example, 

cm5 -write-vTMioi ££££££££ ra £ A 2 

writes 4 blocks of data in pattern OxfffESff in speed 2 to a VME10 in master mode. 


cm5 -r ea d-vme I o 

Reads data from CM PNs to VMEIQ. 

Syntax: cm5-read-vmoio mode ram-mode bXo£ik_jsi£e speed 

mode m (master) or s (slave), 

ram—mode m (memory), f (fifo) or b (bypass) 

For example* 

cm5-read-vmeio m f 4 2 

reads 4 blocks of data in speed 2 to a VMHO in master mode* 


Figure 27* VMEIO test group — page 1 of 2 
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VMEIO Test Group 

(continued) 


cm5 « r 

Transfers data from CM PNs to VMEIO. 

Syntax: cm5-vmeio"iopo^xfer pattern mode ram-mode bLock_flize 
•peed 

mode m (master) or s (slave), 

rant—mode m (memory), £ (fifo) or b (bypass) 

For example, 

cmio-vmBxo-iopa-££££££££ m £ 4 2 
transfers 4 blocks of data in pattern Oxffffffff in speed 2 to a VMEIO in master mode. 


cinio—vme i o«" iopa “a 11-pa 11 e r 

Transfers all thne data patterns 0x00000000, OxffffffTf. Oxaaaaaaaa, 0x55555555, and 
0x37cS37cS from CM PNs to VMEIO. 

Syntax: cmio-VTffwio-opft-all-pattorn-Oxfor access-mode ram^moda 
block-count speed 

For example, 

cmio-vmaio-iopa-all-pattflm-xfer 4 m f 2 

transfers 4 blocks of data in patterns QxOOQOQGOQ, Oxffffffff, Qxaaaaaaaa, 0x55555555, 
and Gx37c837c8 in speed 2 in master mode to a VMEIO. 


Figure 27* VMEIO test group — page 2 of 2 
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F.3.2.2 Executing Individual Tests 

Perform the following steps to execute individual tests. Figure 28 provides an 
example of this procedure. 

NOTE: The I/O device must have its diagnostic server running in background. 

1. Log in as root on the CM-5 diagnostic server and change directory tc 
/usr/diag/emdiag. 

login: user^id 

% su 

password: root_password 
SU hostname /dev/console 

# ed /usr/diag/crodiag 

2. Set the cmdiag_path and jtag_server environment variables. The de¬ 
fault CMDIAG_PATH is /usr/diag/emdiag. The JTAG_SERVEft vari¬ 
able must specify the hostname of the diagnostic server. 

# setenv CKDIAG_PATH /usr/diag/cmdiag 

# setenv JTAG_£ERVER diag_server_hostnatne 

3. Create a partition using the -iop address option to associate a particular 
IOBA with the partition. 

# empartition create -pm pmjiame -iop address 

NOTE: If the desired partition already exists, halt its timesharing daemon 
by executing empartition stop on the partition manager, 

# rlogin -1 root pmjiame 
password: root^password 

# empartition stop 

# exit 

4. Invoke the cmdiag environment and specify which peripheral device the 
test program will write to and read from. Do litis by entering the following 
at the <cm-diag> prompt. 

# ’/cmdiag -p pm_narne 

<cm-diag> select—cmio-seTver hostname 

This command establishes a link to the desired I/O diagnostic server, host¬ 
name is the hostname of the I/O device on which that server is running. 
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5. Next, identify the type of peripheral device that will be involved in the 
test. 

<CM-DIAG> init-cmic-diag-environment devicc_typc 

devicejype identifies the t>pc of I/O device; legal strings are: "hippi", 
"dv", or "vmeio". This command also resets the partition. 

6. Individual tests can now be invoked at the <cm-diag> prompt. Three ex¬ 
amples from the CM-HIPPI test group are shown below. The procedure ex¬ 
ample shown in Figure 28 uses tests from the DataVault group. 

<CM-DIAG> cmS-write-hippi 0 ffffffff 4 2 
<CM-DIAG> cm5-read-hippi 0 4 2 
<CM-DIAG> croiohippi-cmio-data-xfer 
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Invoking Individual End-to-End Tests 

(see Section B.3.2.2) 


The system used lo illustrate this procedure example has the following features, 

— Diagnostic server is named homer,think, coal Prompt = homtL 

- Tests will be run on partition named virgil. think. com and a Data Vault, 
which is connected to an IOBA at address 480 + Prompt For partition 
manager is virg#. 


1 login! userJd 

password: root ^password 
SU homer.thinlc.com /dev/console 
homt cd /u»r/di*g/cmdiag 

2 hom# *et*nv cmdxa&_2Ath /usr/diag/cmdiag 
hom# aetenv jtajg_server homer.think.com 

3 hom# empartition create -pm virgil.think.com -iop 430 

.•' :• • '• ' 

4 |hom# ./cmdiag -p virgil .think .com 

<CM-DIAG> aelect-cmio-server dvl-eerver 


diagnostic lest repon 


Figure 6. Example for invoking individual end-to-end tests. 
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B.3.2.3 Executing Test Groups Automatically 

test-canio-device-data-xfer automatically executes the appropriate set of 
test groups for the I/O configuration in which it is invoked. It decides which test 
groups to run based on the following. 

■ test-cmio-device-data-xf&r must be ran on a partition that has an 
IOBA associated with it (i.e„ the partition was created with the -iop 
address option). 

■ The list of I/O devices connected to that IOBA is provided to cmdiag when 
cmdiag is invoked with the -p pmjname option. 

“ test-cmio-davice-data-xfer surveys the list of I/O devices. It then 
selects the first device listed and runs the test group that applies to its de¬ 
vice type. 

" If multiple I/O devices are connected to that rOBA, test-cmio-devie e- 
data-xf er will select the next listed device and run the appropriate test 
group for it. It repeats this process until all I/O devices connected to the 
target IOBA have been tested. 

Perform the following steps to run end-to-end tests automatically. Figure 7 pro¬ 
vides an example of this procedure. 

NOTE: All I/O devices connected to the target IOBA must have its diagnostic serv¬ 
er running in background. 

1. Log in as root on the CM-5 diagnostic server and change directory to 
/uar/diag/cmdiag. 

login: user_id 

% su 

password: root_password 

SU hostname /dev/console 

# cd /uar/diag/cmdiag 

2. Set the cmdiaG_path and JTAG_server environment variables. The de¬ 
fault CMDIAG_PATH is /uar/diag/cmdiag. The JTAG_SERVER vari¬ 
able must specify the hostname of the diagnostic server. 

# aetenv CMDIAG_PATH /tiar/diag/cmdiag 

# setenv JTAG_SERVER diagserverjiostname 
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3. Create a partition using the -iop address option to associate a particular 
IOBA with the partition. 

# anpartition create -pm pmjume -iop address 

NOTE: If the desired partition already exists, halt its timesharing daemon 
by executing empartitioa stop on the partition manager. 

# riogin -l root pm_name 
password: root_password 

# empartition stop 

# exit 

4. Invoke the cmdiag environment and enter test-cmio-davice-data- 
jc£@r at the <cm— dtag> prompt. 

# . /cmdiag -p pmjiame 

<cm-diag> test-cmio-device-data-x£er 
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Invoking End-to-End Tests Automatically 
(see Section B.3.2.3) 


The system used to illustrate this procedure example has the following features. 

- Diagnostic server is named hom*r.think.com. Prompt = horn#. 

- Tests will be run on partition named virgil. think. com and a Data Vault, 
which is connected to an IOEA at address 480. Prompt for partition 
manager is virgtt. 


-gin: user id 
l a sword t root password 

I honor.think.com /dav/console 
>m# cd /u*r/dlag/cmdiag; 


2 ham# *«t«nv cmdiagJpATH /uar/diag/emdlag 
homfr Adtanv JTCAG SERVER homer.think.coni 

3 homi enpartition create -pm virgil.thin 




diagnostic test report 


<CM-DIAG> quit 


Figure 7. Example Tor invoking individual end-to-end tests. 
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B.3.3 dvtest5-vu, dvtest5-sparc 

dvtest5-vu and dvtestS-spar-c are two versions of the same system verifier 
program — dvtest5-vu is used on systems with vector units installed and 
dvtast 5-spa rc is used on systems without vector units. Each creates and 
writes test files to the DataVault or to a VMEIO-based peripheral, such as a CM- 
IOPG and then reads back each file, comparing it with the expected data pattern. 
Sincle both versions are functionally identical, they are referred to here as 

dvtestS-CAt. 

dvtest5-ext is functionally equivalent to a user application that has file system 
calls. Consequently, it requires the following conditions. 

■ The partition in which it is executed must have ts-daemon running. 

■ The IOBA that will be used must be defined in /etc/io. conf. 

■ f sserver must be running in background on the I/O device. 

■ The dvwd environment variable must specify the file server host of the 
target I/O device. 

The syntax for dvtests-exr is as follows. NOTE: Use either dvtests-vu or 
dvteat5-sparc in place of dvtest5-exr. 

dvtest5-vu | dvtestS-sparc -x -t -X -a[n] -s *-h 
-g intj -d directory-name -1 test name... testname 

-x Exit on error. 

-t Report tersely, 

-1 Run the applicable testfs) once and then exit. 

-a [ 1] Run all tests automatically (no menu). If-ai is specified, 

the tests run once; otherwise they loop forever. Stop exe¬ 
cution by entering Ctrl-C. 

-a Run software test subset automatically (no menu). 

-h Run hardware test subset automatically (no menu). 

-g intj t ...int„ Specify a geometry to be applied to lire data being 
transferred. A string of two or more integers 
separated by commas specify the geometry. Each integer 
represents a dimension measured in 32-bit words. 
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-l testname(s) Run the lest(s) specified by the testnamc arguments). 

Run dvtest5-e;tf -a to exercise the CM-5 system most thoroughly. This 
option includes tests of the file system software and would ordinarily be done 
following installation of CM-5 system software and/or installation of a new I/O 
device, dvtests-cxr cycles through the tests repeatedly until you exit with 
Ctrl-C. 

Use the *-g switch with one or more integer arguments to specify a geometry for 
the test data. Each integer argument specifies a dimension in units of four bytes 
(32-bit words). For example, the recommended geometry for data sent to a Data- 
Vault is 64 by 64, which yields a 16,384-byte data block, the Data Vault’s default 
block size. 

Use the -h option to limit the tests to hardware functions; this will save time 
during routine (i.e„ preventive) maintenance sessions when tire validity of the 
system software is not in doubt. 

Likewise, the -a option allows you to focus diagnostic attention on software 
functions. 

The -I option lets you specify individual tests by name. It is useful when trou¬ 
bleshooting specific hardware or software functions. 

If you do not specify any tests via -a, -s, -h, or -1 testnamc, dvtest5-exr will 
present you with a test menu, allowing you to specify individual tests by number. 
This menu is shown in Figure 8. 
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I. 

basics 

test 

file creation/deletion {—a) 

z. 

data 

test 

simple file read/write (—m , —h> 

3, 

write 

test 

writing files (-a, -h) 

4. 

link 

test 

link/unlink (-a) 

5 * 

ab3_seek 

test 

absolute seek ( random ) (-*, -h} 

6, 

rel^seek 

test 

relative seek (deterministic) 




-h) 

7. 

dir_basics 

test 

mkdir/rmdir/chdir {-ef 

8. 

many dirs 

test 

creating many directories (^s) 

9. 

inany_f lies 

test 

creating/deleting many 



files (~s) 

10. 

serial^io 

test 

serial I/O transfers (^a) 

11. 

mixed_io 

test 

mixed serial and parallel I/O 





12. 

parallel_partial 

test 

parallel I/O w/ partial blocks 




-h> 

13* 

transpose 

test 

transposing serial data (-s) 

14, 

transfer_timing 

test 

transfer speed {—s # -H) 

15, 

raw_transfer_timing test 

raw transfer speed (-■, —h) 

16, 

o ve rh ea d_t i mi n g 

test 

overhead speed (—a, 

17, 

max_transfer_timing test 

max transfer speed 

18. 

reliability 

test 

reliability < —h) 


Figure 8, The dvt.»t5-ejtf menu. 


If dvtestS fails, read the DCP error log on the Data Vault. To read this log, in¬ 
voke dvdiag on the Data Vault console and run rdiog. 

% /usr/diag/tad/dvdiag 
<dv~diag> rdiog bytecount byte_offsei 

byte_count specifies the number of bytes of log contents you want to display. 
byte_offset specifics the starting byte to be displayed. For example, to display the 
most recent 300 bytes of log contents, enter: 

<DV-DIAG> rdiog 300 0 
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B.3.4 hippi-ioopS 

hippi-ioop5 performs a role similar to dvtests for CM-HIPPI systems. It 
runs application code with file system calls to verify that data transfers between 
the CM-5 to CM-HIPPI are fully functional. Test data completes a circuit by loop¬ 
ing from the destination board to the source board within the CM-HIPPI. 

The command syntax for hippi-ioopS is as follows. 

hippi-loop5 -iifield -r -w -ppattern -srize -d -u 
~xiN -v -g -h -R 


-i ifteld 

Use ifield as the I-ficld when establishing the loopback 
connection 

-r 

Test only CM-5 reads from the CM-HIPPI channel. 

-w 

Test only CM-5 writes to the CM-HIPPI channel 

-p pattern 

Specify the data pattern to be used. Valid pattern inputs 
are: data-equal-addrcss, random, or a hex value for a 
constant pattern. Default is data-equal-address. 

-nsize 

Specify the size of the transfer. Valid size inputs are: 16k, 
32k, 64k, 128k, 1M, 2M, and 4M. 

-D 

Drop the connection between tests. 


Use the existing connection if possible. 

-riNrep 

Repeat each pattern Nrep times. 

-v 

Report verbosely when establishing and breaking 
connections. 


Print interface status. 

-h 

Use standard CM-HIPPI ports instead of loopback ports. 


hippi-ioopS has the same prerequisites as dvteats, namely: 

■ The partition in which it is executed must have ta-daemon running. 

■ The target CM-HIPPI must be defined in /etc/io .conf. 

■ £aserver must be running in background on the CM-HIPPI. 
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■ dvwd must specify the CM-HIPPI’s file server host. 

If hippi-loops fails, read dmesg on the CM-HIPPI to determine which byte 
was in error. On the CM-HIPPI console, enter: 

% dmesg 


B.4 DataVault Internal Diagnostics 

dvdiag is a DataVault file server shell command that invokes a special diagnos¬ 
tic environment for testing DataVault hardware functions. This environment is 
controlled entirely by the file server, enabling you to test nearly all DataVault 
functions without the aid of the CM or other external computer, dvdiag resides 
on the DataVault’s station manager in /use/ local/etc/diag. 

The overall command syntax is: 

dvdiag -m -£ -ggroupnatnc -C -fEbdfilt 

The first three arguments relate to how dvdiag tests arc accessed The other three 
control the behavior of the dvdiag command interface (-C) and the test environ¬ 
ment (+e and —s). 

dvdiag provides three lest access modes. Depending on how you use the first 
three arguments, you can invoke: 

■ a complete, predefined test suite 

■ functional test groups 

■ individual tests 

Each of these modes in described separately in Sections B.4.1 through B.4.3. The 
arguments +e and -e arc also explained in Section B.4.1. -c is explained in Sec¬ 
tion B.4.3. 
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B.4.1 Complete Test Suite 

To invoke a comprehensive test suite that will exercise nearly all of the Data- 
Vault’s internal hardware functions, use the following command syntax. 

dvdiag -m +Ebdfilt -Ebdfilt 

This executes the most rigorous level of testing. While its name implies use in 
a manufacturing setting, this level is appropriate for field use as well. The -f 
(field) argument invokes a somewhat abbreviated level of testing and may be 
used when test time must be kept to a minimum. The full manufacturing test suite 
is fast enough, however, to be suitable for nearly all diagnostic situations. 

+E enables the environment condition represented by its accompanying flag, -e 
is used to disable the same set of environment variables. These variables are ex¬ 
plained below. 

b Break on error (default) 

d# Display number (#) of errors. Default is 16. If Log-errors 

is also set, this flag controls how many errors are logged, 

£ loop Forever 

i Ignore errors 

1 Log errors 

t Trace option—this is intended for debugging activities in 

manufacturing; it is not ordinarily used in the field. 

Repeat the +e or -E switch for each variable, separating them with spaces. The 
following example illustrates a command line that enables command comple¬ 
tion, specifies 20 as the number of errors to display, and enables error logging. 
The same command disables the break-on-error condition. 

dvdiag -C +Ed20 +E1 -Eb 


B.4.2 Functional Test Groups 

You can invoke a specific subgroup of tests within the -f suite by specifying 
the -g groupname argument, where groupname is the name of a predefined group 
of tests. The test groups currently available in dvdiag are summarized below. 


November 17,1992 



Appendix B. I/O Diagnostic Tools 

mtM 


113 


■ dcp — This group exercises hardware functions on the DCP board. 

■ dvi — This group exercises hardware functions on tire DVI boards. 

■ scsi — This group runs a set of tests specific to the SCSI boards. 

■ dv — This group tests connectivity between tire DCP, DVI, DP, and SCSI 
boards. It also runs drive sparing and ECC logic tests for the DP board. 

■ dvx — This test reads and writes 1K buffers of data at full speed. 

For example, to run the field program’s DVI test group, type: 


% dvdiag -f -gDVI 


B.4.3 Invoking Individual Tests 

If you enter dvdiag without the -f (or -m) argument, you invoke the dvdiag 
command interpreter, which is represented by the <dv-diag> prompt. At this 
prompt, you can explicitly invoke any individual tests or test groups that are con¬ 
tained in dvdiag. The syntax for operating in this mode is: 

dvdiag -C +Ebdfilt -Ebdfilt 

The -C argument is the command completion option. It allows you to enter ab¬ 
breviated commands at the <dv-diag> prompt. Instead of typing the full com¬ 
mand, just enter enough characters to distinguish the command from all others 
and then press Escape. When the command completion facility finishes the full 
name, press Return to enter it. 

The +E and -e arguments function the same as described in Section B.4.1. 

Enter the names of the tests you want to run at the <dv-diag> prompt. When the 
test completes, dvdiag returns you to the <dv-diag> prompt. For example, 10 
test the Data Vault's parity generator, enter: 

<DV-D rAG> test—dcp—parity—generation 
<DV-DTAG> 

In addition to the individual test commands, the dvdiag command interpreter 
recognizes a number of other commands, which invoke various auxiliary func- 
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lions and utilities- They are described in Appendix B of Ihe DataVault Installa¬ 
tion and Service Manual. 


B.5 CM-IOPG Internal Diagnostics 

viodiag is a package of functional tests that exercise the CM-IOPG internal 
hardware. It resides in /us r/local/etc on the CM-IOPG station manager. 

viodiag's user interface closely resembles that of dvdiag. That is, you can in¬ 
voke a complete test suite, functional test groups, or individual tests. The com¬ 
mand syntax for using viodiag is as follows. 

viodiag -m -f -ggroupname -i -C -Efihbld -S-Efihbld 
-^filename 

-m Run Manufacturing diagnostic tests for VMEIO 

device. 

-f Run Field diagnostic tests for VMEIO device. 

-ggroupname Run tests for group specified by groupname. 

-i Include Interactive tests. 

-c Allow command Completion within viodiag. 

+e Set diagnostic Environment (activate options): 

f = Loop forever 
i = Ignore errors 
h = Halt on error 
b = Break on error (default) 

1 = Log errors (default) 

d# = Display error count (default=16) 

-e Set diagnostic Environment (deactivate options). 

-afilename Execute a viodiag shell file given by filename 

As with dvdiag, viodiag offers a comprehensive test suite that is invoked by: 

% viodiag -f 
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The environment variables governing how the test suite behaves can be specified 
on the same command line. For example, 

% viodiag -f +Ed20 +E1 -Eb 

specifies 20 as the number of errors to display and enables error logging. The 
same command disables the break-on-error condition. 

viodiag also provides a set of predefined functional test groups, which are in¬ 
voked using the -qgroupname argument. For example, the following tests the 
RAM FIFO flags. 

% viodiag -f -gRAM-FIFO-FLAGS 
The viodiag test groups are listed below. 

CMXO-BUS—TEST 

CMIO-INTERRUPTS 

CMIO—FIFO—FLAGS 

CMIO-PORT-LOOPBACK 

CMIO—PARITY 

INTERACTIVE 

master-status 

RAM-FIFO-FLAGS 

RAM-PARITY 

REGISTERS 

SLAVE-FIFO-RAM 

SLAVE-MAPPED-RAM 

SLAVE-TIMEOUT 

VME 

VME-ADDSESS-GEN-TEST-MODE 

VME-ADDRESS-GENERATOR 

VME—INTERRUPT 

VME-MASTER 

VME-MAS TER—TIMEOUT 

VMEIO-CM-TRANSFERS 
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NOTE: In order to catch any errors reported by viodiag, you must have a win¬ 
dow open on viodiag before you start the test. Then, if any test fails, go to the 
window and type show-board-status. 


B.6 CM-IOPG Verifiers 

There are two programs that can be used to verify the ability to move fries be¬ 
tween a CM-IOPG system and a Data Vault on the same CMIO bus. These pro¬ 
grams. serial and TapeDVxfervfr, reside in directory /usr/local/etc on 
the CM-IOPG. They are described in Sections B.6.1 and B.6.2. 


B.S.l serial 

serial transfers test data between the CM-IOPG and the DataVault, comparing 
the data it reads with the data that it sent. It verifies the CMIO bus as well as major 
portions of the CM-IOPG and DataVault internal hardware. 

The procedure for using serial is shown below. 

1. Log in to the CM-IOPG station manager and set the DVWD environment 
variable to specify the DataVault. 

login: userjd 
Password: user_passviord 
% setenv Dvwo data\auh_hostname 

Use the hostname of the DataVault file server for datavault hostname. 

2. TO invoke serial, enter: 

% /usr/local/etc/serial 


B.6.2 TapeDVxfervfr 

TapeDVxfervfr transfers Lest files between the CM-IOPG and the Data Vault 
and between the CM-IOPG and the CM-TUD. It alternates between the Data Vault 
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and the CM-IOPG using various block sizes. While executing, the test will use 
system memory and VMEIO memory. It also uses both variable and fixed block 
mode access to the tape. The complete program provides a means for verifying 
the ability to transfer files between a CM-TUD and a Data Vault. 

Before initiating TapeDVxfervf r, verify that an appropriate tape cartridge is 
installed in the selected tape drive and that the DavaVault file name can be re¬ 
written if it already exists. If the file name does not exist, a new file will be 
created. 

There are two user commands associated with this program, TapeDVxfervfr 
and teat_DV_to_Tape. The first command invokes the program executive and 
the second starts the verifier program itself, 

test_DV_to_Tape will prompt you for the following argument input. 


Parameter 

Expected Format 

Description 

tape drive 

/dev/<drive-name> 

Enter the hostname of 
the tape drive. 

DV file 

=d v aul t:/<path>/<file-n am e> 

Enter the complete path 
to the DataVault file. 

user_blksize 

=Tape Block size 

Enter the block size of 
the transfer. 

vmeio_unit 

=VMEIO unit number 

Enter the unit number of 


the VMEIO module. 


The procedure for using Tapeovxfervtr is shown below. 

1. To initiate the program, enter: 

% /usr/local/etc/TapeDVxfervfr 

DIAGNOSTIC EXECUTIVE FOR DV/Tape Tests 

State: RELEASE-6-1 Date: 91/08/13 11:57:43 State: RELEASE-6-1 
<test_DV_to_Tape-DlAG> 

2. At the prompt, enter 

<test_DV_toJTape-DIAG> test_DV_to_Tape4 
tape drive /dev/ dvJiostname 
DV file =dvault: /pathname/filename 
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user_blksize tapeblock_size 
vmeio unit vmeio unit number 


B.7 DM-HIPPI Internal Diagnostics 


The CM-HIPPI provides a set of tests that allow you to diagnose its internal hard¬ 
ware functions. These tests arc organized into four programs. 


■ arediag 
* destdiag 

■ iopdiag 
" sysdiag 


Tests source board functionality; see Section B.7.1. 

Tests destination board functionality; see Section B.7.2, 

Tests iop board functionality; see Section B.7.3. 

Tests the ability of the source board to send data and 
for the destination board to receive data (uses 
loopback cable). See Section B.7.4. 


These tests are executed on the CM-HIPPI station manager. Before starting the 
tests, log in to the station manager as root and change directory to /usr/local/ 
etc/diag. 


login: uscr^id 
% su 

password: root _p as Sword 
SU hostname /dev/console 

# cd /usr/local/etc/diag 

* 


B.7.1 Source Board Functional Test 

The following procedure tests Source board functionality. 

1. First, ensure that the Source board is mil cabled to any other device, such 
as a remote HIPPI device or to the Destination Board via a loopback cable. 

2. Invoke the Source board command interpreter with the command comple¬ 
tion flag. 
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# srcdiag —C 
<srcdiag> 


3, When you see the Source board diagnostic prompt, run the following tests. 


<srcdiag> ktest 1 
<srcdiag> score k 
<Srcdiag> stest I 

<srcdiag> score s 
<srcdiag> quit 
# 


/* tests Source’s AMD 29K internals*/ 
/* displays results */ 

/* tests Source’s internals and*/ 
f* station manager interface*/ 
f* displays results*/ 


B.7.2 Destination Board Functional Test 

The following procedure tests Destination board functionality. 

1. First, ensure that the Destination board is aol cabled to any other device, 
such as a remote HIPPI device, or to the Source board via a loopback 
cable. 

2. Invoke the Destination board command interpreter. 

# destdiag —C 

<destdiag> ktest 1 /* tests Destination’s AMD 29K*/ 
<destdiag> score k 

<destdiag> dtest 1 /* tests Destination's internals and*/ 

/* station manager internals*/ 

<destdiag> score d 
<destdiag> quit 

# 


B.7.3 10P Board Functional Test 

1. Ensure that no IOP boards are attached to a CMIO bus. 

2. Invoke the IOP diagnostic interpreter with the -c flag and run the follow¬ 
ing tests. 

# iopdiag -C 

<iopdiag> select iopO /* select first IOP */ 
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<iopdiag> itest 1 
<iopdiag> score i 
<iopdiag> select iopl 
<iopdiag> itest 1 
<iopdiag> score i 
<iopdiag> quit 
# 


/* test first IOP */ 

/* print results for first IOP */ 
/* repeat for each IOP */ 

/* repeat for each IOP */ 


3, Repeat the select iop, itest, and score commands for each IOP. 
For each new select iop, simply change the number of the board to be 
selected. Remember, the boards are numbered 0-7 from right to left, as 
viewed from the front of the system. 


B.7.4 System (Loopback) Test 

The next step is to test the ability of tire Source board to send data and the Desti¬ 
nation board to receive data. This is done by connecting their OUT and IN ports 
on the CM-HIPPI bulkhead via a loopback cable. 

1. Install the loopback cable between the OUT and IN ports on the CM-HIPFI 
bulkhead. NOTE: Do not install any CMIO bus cables yet. 

2. From directory /usr/local/etc/diag, invoke the system command inter¬ 
preter. 

# sysdiag -C 
<sysdiag> 

3. When you are asked if yuu are attached to an I/O bus, answer no, 

4. At the <sysdiag> prompt run the following test. 

<sysdiag> etest 1 
<sysdiag> score e 
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G.1 Introduction 

(to be supplied) 
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Appendix H 

Tracing I/O Errors 


H.1 Introduction 

This appendix describes three sets of diagnostic procedures for T/O-rclatcd hard¬ 
ware problems. 

■ Section H.2 presents the basic procedure for troubleshooting CM-5 I/O 
failures. 

■ Section H.3 describes a supplementary procedure for exercising an IOB A- 
to-Da la Vault connection. This procedure can be useful as a cross-check of 
other diagnostic results, particularly for elusive hardware problems that 
exhibit unusual failure modes. 

■ Section H.4 explains the procedure for using the system exercisers, 
dvtests and hippi-loop. These programs are recommended for veri¬ 
fying overall system functionality after a hardware failure has been cor¬ 
rected and before the system is returned to regular service. 


H.2 Basic I/O Troubleshooting Procedure 

(to be supplied) 
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H.3 Verifying lOBA-to-DataVault Path 

(lo be supplied) 


H.4 System Verifiers for DataVault and HIPPI Paths 

(to be supplied) 
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hardware.install file 



1.1 Introduction 

/etc/cm/configuration/hardware. install describes the hardware con¬ 
figuration of the CM-5 system on which it resides. This file is created at the facto¬ 
ry to reflect the state of the particular CM-5 as it will be installed. 

When the system’s hardware configuration is changed, you will need to edit the 
file to reflect the change; hardware. install must always match the configu¬ 
ration of the system. 

NOTE: For one-to-one replacement of hardware modules, you do not need to up¬ 
date hardware.install because there is no net change to the hardware con¬ 
figuration. 

Figure 31 illustrates a sample hardware.install file. Its contents arc ex¬ 
plained below. 

NOTE: The numbers to the left of the shaded areas, and the shading itself, arc not 
part of hardware. install. They have been added to aid in description of the 
file. 
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# define STANDARD DR BACKPLANE 




Partition Mana^ier 


Console 

Diagnostic-Processor 


Hostname == "mi It on. think . com J 


(continued on next page) 


1 't! ;> 
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i 

Hostname 



ho 

'v>r;-y, '-V'. Cj vP, 

-me.r;th: 

L.riK, 

com 


Figure 31. hardware . install example — (pilgC 1 of 3) 
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{cominucc on next page) 




Figure 31- hardware, install example — (page 2 of 3) 
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DR Cabinet 


DR_Backp;lane' 


CLKBUF 
CLKDN 
CLKDR ' 


Figure 31. hardware, in stall example — (page 3 of 3) 
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1.2 File Header (shaded area 1 ) 

The first area in Figure 31 contains file header information only. You needn't 
ever edit this content. 


1.3 CM System (shaded area 2) 

This line introduces the balance of Lite file, which describes the physical compo¬ 
sition of the system. The general organization of the system description consists 
of: 

* General system attributes — system name, DR height, and processing 
node type (Sections 1.4 through 1.6). 

■ Individual descriptions of partition managers (Sections 1.7 and 1.8). 

■ Individual descriptions of device cabinets and network cabinet (Sec¬ 
tions 1.9 and 1.10). 


1.4 System Name (shaded area 3) 

This line specifies an arbitrary name for the system. In this example, the system 
name is "Calliope." 


1.5 DR Height (shaded area 4) 

This line indicates the highest — or root — level of die data network. In this 
example, the highest level is 5. 
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1.6 PN Type (shaded area 5) 

These lines describe the system’s processing nodes in terms of the following at¬ 
tributes. 


■ PN Memory Specifics the memory capacity per PN. Cur¬ 

rently, this attribute can be either 8 Mbytes 
or 32 Mbytes. In this case, it is 8 Mbytes. 

■ PN III Specifics the type of integer unit in use. 

Currently, SPARC is the only valid entry. 

* PN FPU Specifics the type of floating point unit in 

use. Currently, SPARC is the only valid 
entry. 


L7 Partition Managers (shaded area 6) 

This section describes each partition manager (PM) in the system. There are two 
PMs in this example. 

Each PM is identified by an integer, which is arbitrarily chosen to distinguish that 
PM from all others in the system. In this example, the PMs are designated 0 and 
1. This designation is followed by a list of attributes. The PM 0 attributes arc 
defined below. 

• Hostname This is a quoted string that gives the PM a 

name that may be easier to remember than 
its integer designation, In this example, PM 
0 is named "homer . think. com. " 

■ Console This entry indicates that PM 0 serves as the 

system administration console. 

■ Diagnostic Processor This entry indicates that the diagnostic ser¬ 

ver, j tagserver, runs on this PM. 

PM 1 serves no other role than partition manager. Consequently, its only attribute 
entry is its hostname, which in this example is "milton.think.com." 
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1.8 PN Cabinet 0 (shaded area 7) 

This section describes the composition of a single device cabinet. The first item 
of description is an integer that uniquely identifies this cabinet in a multi-cabinct 
system. By convention, this integer indicates the cabinet’s physical position in 
relation to otherdevicc cabinets in the system. Figure 32 shows the cabinet num¬ 
bering scheme used for CM-5 systems of up to 2 K network addresses. 


Cabinet 
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Cabinet 

4 


Device 

Cabinet 


Device 

Cabinet 


Cabinet 
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Cabinet 

5 

- 

Device 
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Device 
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- 

Cabinet 

1 


Device 

Cabinet 


Network 
Cabinet 1 


Network 

Bridge 


Device 

Cabinet 


Network 
Cabinet 0 


Cabinet 


2 


Cabinet 

7 


Device 

Cabinet 


Device 

Cabinet 


Cabinet 

3 


Figure 32. CM-5 device cabinet numbering system. 


NOTE; Where hardvate. install refers to PN cabinets, understand that it 
means device cabinets. The term PN (processing node) cabinets is a historical 
artifact. Likewise, you should translate references to DR cabinets to network cab¬ 
inets. 

The cabinet contents arc then listed by backplane. Figure 33 shows how back¬ 
planes are numbered in a device cabinet, and Figure 34 through Figure 36 show 
the slot configurations of the standard PN, DR, and CN backplanes. 

In this example, hardware. install shows cabinet 1 to have the following 
backplane configuration. 

■ PN Backplanes 0-7 All eight PN backplanes contain the stan* 

dard PN configuration of circuit modules. 
Consequently, a detailed breakdown of the 
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backplane contents is not given. This stan¬ 
dard configuration includes eight PN circuit 
modules, plus a CN module and a CLKDN 
module. 

■ DR Backplanes 8-9 These backplanes contain circuit modules 

that form the uppermost levels of the de¬ 
vice cabinet’s data network. In systems 
with multiple cabinets, these backplanes 
arc connected by cable to the network 
cabinet. 

■ CN Backplane 10 The control network backplane contains 

Circuit modules that form the uppermost 
levels of Hie device cabinet's control net¬ 
work. In systems with multiple cabinets, 
lliis backplane is connected by cable to the 
network cabinet. 


View from View from 

PN Board Insertion Side DR & CN Board Insertion Side 



Figure 33. CM-5 device cabinet backplane numbering 
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Figure 34. Standard PN backplane slot assignments 



Figure 35. Device cabinet DR backplane slot assignments 
(backplanes 8 and 9) 
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View from 

Board Insertion Side 



CLKDN i 
CLKDWG 


Figure 36. Device cabinet CN backplane slot assignments (backplane 10) 


1.9 PN Cabinet 1 (shaded area 8) 

In this sample file, a second device cabinet, cabinet 2, contains interfaces to the 
partition managers as well as an I/O interface. The various backplane configura¬ 
tions in cabinet 2 are described separately in Seetiuns 1.9.1 through 1.9.4. 


1.9.1 PN Backplane 3 

A single standard PN backplane, backplane 3, contains both PM interfaces as well 
as auxiliary circuit modules. No other standard PN backplanes are used in this 
cabinet. 

The circuit modules that fill the backplane slots arc summarized below, with the 
slot location(s) indicated to the right of each circuit module type. 

■ SPI 0 The SP1 in slot 0 is the interface to partition 

manager 0. 
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■ SVME 0 


This entry associates an SVME module with 
the SPI in slot 0. 


■ SPI 4 

■ SVME 4 


The SPI in slot 4 is the interface to partition 
manager 1. 

This entry associates an SVME module with 
the SPI in slot 4. 


* FILLER 1-3 These three slots contain circuit modules 

that fill the gap in the network that would 
otherwise occur when a backplane is not 
fully populated with funciional network 
devices, such as PNs. 

* FILLER 5—7 Same as FILLER 1—3, 


■ CN 0 


This slot contains a portion of the control 
network. 


■ CLKDN 0 


This slot contains the backplane’s interface 
to the clock and diagnostic networks. 


1.9.2 I/O Backplane 7 

This backplane contains a set of circuit modules that together form the interface 
to a CMIO bus and one or more peripherals attached to the bus. These peripherals 
can include Data Vaults, CM-HTPPIs, and/or CM-lOPGs. 

The I/O backplane and its constituent circuit modules are referred to as an 10BA, 
(Input Output Bus Adapter). Figure 37 illustrates the slot organization of an 
IOBA chassis. A standard IOBA configuration contains six circuit modules; their 
hardware.install entries arc summarized below. 


IOBUF 

1 

This entry indicates that slot 1 contains one 
IOBUF module. 

IOBUF 

2 

Slot 2 contains the second IOBUF module. 

IOCNTRL 

0 

Each IOBA has one I/O control module; it is 
always identified by the label 0. 
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■ IOCHNL 0 

This line indicates that an I/O channel is 
provided in slot 0. 

■ IODR 0 

A standard IOBA has one data network in¬ 
terface module; it is always identified by 
the label 0. 

■ IOCLK 0 

Each IOBA has one clock interface module; 
it is always identified by the label 0. 

Another file, io.conf, contains additional I/O configuration information. It de¬ 
fines various attributes concerning the components connected to the CMIO bus, 
including this IOBA, that arc of interest to the fileserver. If the 10B A or CMIO bus 
arc modified in any way that affects these attributes, io. conf must be updated 

as well Appendix J describes io 

.conf in detail. 



Figure 37. IOBA backplane slot assignments with standard board configuration. 
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1.9.3 DR Backplane 8-9 

The data network backplanes 8 ard 9 contain: 

■ DR 0-45 These backplanes contain 16 DR circuit 

modules, which provide the link among all 
data network components residing in this 
cabinet. In multi-cabinet systems, they also 
form the interface lo the higher levels of the 
data network in the network cabinet. 


■ CLKBUF 0 This module buffers and distributes system 

clocks lo the Data Network boards. 


1.9.4 CN Backplane 10 

The control network backplane, backplane 10. contains: 6 CN circuit modules 
and a CLKDN circuit module. 


■ CN 0-5 


• CLKBUF 0 


■ CLKDN 0 


This backplane contains 6 CN circuit mod¬ 
ules, which provide the link among all con¬ 
trol network components residing in this 
cabinet. In multi-cabinet systems, they also 
form the interface to the higher levels of the 
control network in the network cabinet. 

This module receives the system clock sig¬ 
nal from tlie CLKDN board and drives it 
out to the Control Network boards. 

In systems with two or more device cabi¬ 
nets (greater than 256 network addresses), 
this module is the interface lo the clock and 
diagnostic networks residing in the network 
cabinet. In single-dcv ice-cabinet systems, 
this module serves as the system clock and 
diagnostic network root. 


November 17,1992 


138 


CM-5 Field Service Guide — Preliminary 


1.10 DR Cabinet (shaded area 9) 

The network cabinet contains the data and control network modules that form the 
uppermost levels of their respective trees. The first entry in this section is an 
identifier for this cabinet—a large integer that will distinguish this network cabi¬ 
net from all other cabinets in the system. By convention, 4096 is used as the 
identifier for the first network cabinet in the system. 

NOTE: Except for the requirement that this number be large enough to exceed the 
highest possible device cabinet number, its value has no specific meaning. 

In the network cabinet, the DR and CN backplanes arc in the center section of the 
cabinet — lire space occupied by backplanes 0-3 in a device cabinet. Figure 38 
shows the location of these backplanes. Figure 39 and Figure 40 show the DR 
and CN slot assignments in each. 

The hardware. install entries representing these backplanes are summarized 
below. 


DR 

0 

This backplane contains 16 DR circuit 
modules and a CLKBUF circuit module. 

DR 

1 

This backplane contains 16 DR circuit 
modules and a CLKBUF circuit module. 

CN 

4 

This backplane contains 6 CN circuit mod¬ 
ules, one CLKBUF circuit module, and twe 
CLKDN circuit modules. 


NOTE; Although the sample hardware. install file represents only a single 
network cabinet with only two DR backplanes populated, Figure 39 and 
Figure 40 illustrate the backplane slot assignments for systems with two network 
cabinets and height 6 DR and CN modules. 
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Figure 3S. CM-5 network cabinets 0 and I backplane numbering 



View from 

Board Insertion Side 
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0 1 2 a 4 5 G 7 F e 9 10 11 12 13 M 15 



Figure 39, DR backplane slot assignments for network cabinets 0 and 1 
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CN Backplane 
Network Cabinet 0 
View from Board Insertion Side 



o 0 1 2 


CN Backplane 
Network Cabinet 1 
View from Board Insertion Side 



o 10 12 

NOTE: 


HalgM 5 CN modules Ilia local devloo 
device cabin si s. Height 6 CN modules serve iha 
onlire system. 


Figure 40, CN backplane slot assignments for network cabinets 0 and 1 
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Appendix J 

io.conf file 


The file /ete/io, eonf is the CM-5 I/O system’s configuration file. It must al¬ 
ways accurately reflect the state of the I/O system, io.conf is created by a 
Thinking Macfiines Customer Support representative when the I/O system is in¬ 
stalled. Thereafter, io. conf must be updated whenever the I/O system is recon¬ 
figured. This section explains the components of io .conf so that you can edit 
them if the system’s configuration changes. 

io.conf is an ASCII file. As such, these general-format rules apply: 

■ Numeric arguments can be specified in hex (leading Ox or ox), octal 
(leading o), or decimal. 

■ Characters following a semicolon on the same line arc ignored. 

■ As long as all entries are left-justified, io. conf can contain any amount 
of while space. 

■ The parser is case-sensitive; all alpha-text must be typed into io. conf 
exactly as shown in this section. 

Some of the entries in io.conf require you to count hardware entities. Count 
the first entity as number 1, not number 0. 

Figure 41 illustrates an I/O system with two IOBAs, two Data Vaults, a CM-HJPPI, 
a CM-IOPG, and a CM-TUD. Figure 42 represents the io. conf file for the con¬ 
figuration shown in Figure 41. 

The rest of io, conf consists of two main modules; 

■ The Channel„Board_Configuration module contains information 
about the IOBAs. Section J.l describes this module in detail. 

■ The lO_device_configuration module describes the I/O devices 
configured into the system. Section J.2 describes this module in detail. 
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Slot 0x0 

i /•'I. 


Station SO 13 


Bus ID 102 

"c “ " J ■■■ SS? : 


Station ID 12 


DataVault 1 


IOBA1 


■ 


j :, jy* r '-A - 


Buffer 

Board 


Data Net 
Board 


Controlled 

Board 


Slot 0X0 


Station to 14 


Channel 

Board 


Bus ID 100 


Station ID 1 


DataVault 2 


Channel 

Board 


Station ID 10 


Bus ID 104 


Station ID 9 


® a bus arbiter 


Slot 2 


NI544 


Buffer 

Board 


8uffer 

10 Clock 

Data Net 

Board 

Board 

Board 


■■w-v. r 


Controller] 

Board 




Buffer 


10 Clock 

Board 


Board 


Figure 41. Sample CM-5 I/O system configuration. 
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IO_configuration_file_vl.0 ;Identification string 

Cha n n e1_B o a r d^Con figuration 
2 


480 

0x0 

100 

1 

0 

2 

Oxl 

0x2 


544 

0x0 

102 

2 

0 

2 

0x2 

0x5 


— continued — 


; Total number of channel boards in 
; all IOBAs combined. 

; Channel board in IOBA0 
; IOPN KI physical address 
; Channel board slot number 
; CMIO bus id 
; Station id 

; CMIO arbiter status flag 
; CMIO bus speed 
; Buffer board slot number 
; {Must be one of l f 2,5,7, 8, a f b) 

; Buffer hoard slot number 
; {Must be one of 1,2 f 5 f 7,8,a,b) 

; Channel board in IOBA1 
; IOPN HI physical address 
; Channel board slot number 
; CMIO bus id 
; Station id 

; CMIO arbiter status flag 
; CMIO bus speed 
; Buffer board slot number 
; (Must be one of 1,2,5,7,8, a, b) 

; Buffer board slot number 
; (Must be one of 1,2 f 5 r 7,a,b) 


Figure 42. Sample io.conf for the I/O system diagrammed in Figure 41. 
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IO_device_configuration 


5 


Number of XO devices in 

system 

dv2 

r 

hostname of IO device 1 

(Also 



default host name) 


DV 


type of IO device 1 


14 


station id of IO device 

1 

100 


bus id of IO device 1 


dvl 

/ 

hostname of IO device 2 


DV 

* 

type of IO device 2 


13 

J 

station id of 10 device 

2 

102 

r 

bus id of 10 device 2 


dvi 


hostname of TO device 3 


DV 

§ 

type of 10 device 3 


12 

t 

station id of 10 device 

3 

104 

u 

* 

bus id of 10 device 3 


hiocl 

■ 

J 

hostname of 10 device 4 


HIFPI 

* 

J 

type of 10 device 4 


10 

t 

station id of 10 device 

4 

104 

I 

bus id of 10 device 4 


iopgl 

" 

hostname of 10 device 5 


VME 


type of 10 device 5 


9 

l 

station id of 10 device 

5 

104 

M 

bus id of XO device 5 



Figure 42, continued* Sample io* ccnf for the I/O system diagrammed in Figure 4 L 


J.1 The Channel_Board_Configuration Module 

io.conf must contain exactly one Channel_Board_Configuration mod¬ 
ule, which describes the IOBAs. The Channel^Board^Conf igurat ion mod¬ 
ule is comprised of submodules that describe the IOBAs 1 channel boards. 

The first line of the Channel_Board_Configuration; module specifies the 
total number of channel boards in die system, Following this line arc one or more 
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submodules: one submodule for each channel board. Each submodule must con¬ 
tain eight lines, in this order: 

■ The first line specifics the physical address of the NI on the controller 
board in the same IOBA as the channel board. 

■ The second line specifics she slot number of the channel board. 

■ The third line specifies the bus ID of the CMIO bus to which the channel 
board is connected. 

■ The fourth line specifies the station ID of the channel board. 

* The fifth line specifics whether the channel board is lire bus arbiter (1) 
or not (0). 

■ The sixth line specifies the CMIO bus’s speed. This value is always 2. 

■ The seventh and eighth line each specify the slot number of one of the 
two buffer boards associated with the channel board. It does not matter 
which board’s slot number is listed on the seventh line and which is 
listed on the eighth. 

The ordering of the submodules is arbitrary. That is, if there is more than one 
IOBA in the system, you need not place the submodule for lOBAO’s channel 
board before the submodule for IOBAl’s channel board, although it is convcn- 
lional to do so. 


J.2 The IO_device_configu ration Module 

io.conf must contain exactly one IO_d@vice_con£iguration module. Its 
submodules describe each I/O device—DataVault port, CM-IOPG, and CM- 
HIPP1 — configured into the system. 

The first line of the I/O_device_con£iguration module specifies how 
many I/O devices arc in the system. Be sure to count each configured DataVault 
port as a separate device. 

Each device must be described by exactly one submodule. Each submodule must 
contain four lines, in the order listed: 

■ The first line specifies the hostname of the device. 
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* The second line specifies a code lhat indicates the type of the device: 

■ dv indicates a DataVauIt port. 

• vme indicates a CM-IOPG. 

■ hippi indicates a CM-HIPPL 

■ The third line specifics the device’s station ID. 

■ The fourth line specifies the bus ID of the CMIO bus on which the device 
resides. 

The ordering of the submodules is arbitrary except as it is used by the CMFS file 
system and I/O diagnostics to determine the default I/O device. The system deter¬ 
mines the default device by searching ±o. conf for the first channel board that 
has at least one I/O device on its bus, and then, if there is more than one device, 
choosing the one that appears first in the io_device_conf iguration module. 
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Appendix K 

Error Reporting System 


K.1 Overview 

The CM-5 error reporting system provides useful information about failures dis¬ 
closed by cmdiag tests. When a diagnostic routine finds a hardware fault, the 
error system parses the error status of all visible components in the partition un¬ 
der test and, upon request, reports its findings. 

This report provides a summary description of each test failure and identifies 
which module (circuit board) and individual components arc implicated in the 
failure. 

Error messages arc logged in diag-error-log. hostname in the local directo¬ 
ry on the associated Partition Manager, hostname is the name given to the Parti¬ 
tion Manager, 

NOTE: This discussion assumes that diagnostics arc being run on a partition rath¬ 
er than the entire CM. The description applies equally to a partition dial encom¬ 
passes all of the Processing Nodes in the system. 

The command to read the error system report on line is f ind-cm-error. 

<CM“DIAG> find-cm-error 

The error reporting system responds to this command by displaying the contents 
of diag-error-log. hostname. Alternatively, you can read Lhc error log di¬ 
rectly in a gmacs buffer or output it to a printer. 
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K.2 Interpreting Error System Reports 

Figure 43 shows examples of die types of error messages to be found in diag- 
error-iog . hostname. The basic format is the same for all messages, regard¬ 
less of the type of error being reported or the type of hardware associated with 
the error. 

Section K.2.1 discusses the contents of the first error message example shown 
in Figure 43, a Control Network error It also explains the various features of the 
message format that arc common to all message types. Sections K.2.2 through 
J.2.?? discuss other message types, with particular emphasis on their special 
characteristics. 


Global Address (Cabinet 0 Backplane 0 Slot 0) Type CN 
Network Address {CN_NODE Height 2 Leaf 1 Root 0} 

ID_Prom [TYPE 03 REV 00 ID# OOac] PodJIype CN 
Chip_Name CN-1 Chip Type FEDEX IRJcati iOOOOlOlOlDDOOOOOOl 
Register NODE-ESTAT-G 101111110111111111 
Bit 1, NODE-G-UP-TYPE 
Bit 0* NODE-O-UP-HARD-ERROR 
Register NO0E-E5TAT-1 101111110111111111 
Bit 1. NODE-1-UP-TYPE 
Bit 3i NODE-1-UP-HARD-ERROR 
Register NODE-ESTAT-2 1011111101lllll111 
Bit 1. NGDE-2-UP-TYPE 
Bit 8, NODE-2-UP-HARD-ERROR 


Figure 43. diag-error-log example. 


K.2.1 Control Network Error Example 

The first four lines of all error messages are nearly identical. Their contents arc 
summarized below, using Figure 43 for reference. 

* The first 1 inc specifies the physical address of the hardware reporting the 
error. For example, the first error shown in Figure 43 was delected in 
slot 0 of backplane 0 in cabinet 0. The module occupying that location 
is a CN board. 
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Global Address {Cabinet 0 Backplane 0 Slot 0) Type CN 

■ The second line gives the network address or the error. In the first exam¬ 
ple, the error was reported from the Control Network node at height 2, 
leaf 1 attached to root 0. 

Network Address (CN_Node Height 2 Leaf 1 Root 0} 

With this information, you can find this node in the CN topology chart 
and understand its place in the failed operation’s CN communication 
path. 

■ The third line identifies the ID prom of the module that reported the er¬ 
ror. In the first error example, the ID prom is of type 03, revision level 
00 and lias the hexadecimal tag OOac, It resides on a CN module (this 
last piece of information is redundant with the first line). 

ID_Prom [TYPE 03 REV 00 ID# OOac] Pod_Type CN 

Note, this information is intended primarily for long-term tracking of 
hardware failure patterns. It has no relevance to troubleshooting. 

■ The fourth line identifies the individual chip that is most closely asso¬ 
ciated with the error report. In the first example, CN chip 1 of chip type 
FEDEX is called out and its instruction register contents are scanned out. 

Chip_Name CN_1 Chip_Type FEDEX IR_Scan 1000010101000000001 
IR bit 0 is leftmost, 

■ The remaining lines describe the error itself, displaying the contents cf 
each relevant error status register and explaining the meaning of each 
relevant status bit. These lines will vary most from message to message, 
depending on the type of error and die type of hardware reporting it. 
These lines are discussed more fully below for the CN error example. 
Sections J.2.3 through J.2.?? explain these lines for other error types. 

The error example in Figure 43 shows a CN node reporting an error. This node 
has three status registers, 0,1, and 2. NO TAG shows where these registers reside 
in the node and how they relate to adjacent nodes in the network. 

Register NODE-ESTAT-O 101111110111111111 
Bit 1. NODE-O-UP-TYPE 
Bit 6. NODE-Q-UP-HARD-ERROR 
Register NODE-ESTAT-1 101111110111111111 
Bit 1, NODE-O-UP-TYPE 
Bit 8. NODE-O-UP-HARD-ERROR 
Register NODE-ESTAT-2 101111110111111111 
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Bit 1. NODE—0-OP-TV?E 

Bit 8. NODE-O-UF-HARD-ERROR 

Bit 1 of each register indicates that it received a faulty message (CRC value was 
invalid) from a node lower in the tree (from a child). Bit 8 indicates that this is 
the first node in the message path to detect the error. As this faulty message is 
propagated further through the CN, nodes subsequent to this one will set their soft 
error bits. 



Figure 44. CN node configuration. 


Data Network Error Example 

(to be supplied) 
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Appendix E 

dvtest5 Description 


dvtest5 

dvteatS, dvtestS-sparc, dvtestS-vu — User-level verifier lor SDA File Systems 
(SFS), IOBA File Systems (CMFS), and supporting hardware. 


Syntax- 

dvtestS-sparc | dvteatS-vu [-x] [-t] t~l] [-gintl, ... inin 1 

E-d directory-name] { (-a (1} I -3 I -h | [-1 testname 
[testname 1 ... Ill 

-x Exit on error. 

-t Report tersely. 

-1 Run selected lesi{s) one time only, rather than 

looping forever. 

-g inti,... intn Specify a geometry to be applied to die data being 

transferred, using a string of one or more integers 
separated by commas. 

-d directoryjmme Causes dvtestS I dvtest5-vu to change directory 

to dirc«ory_name before starling. 

-a [1 ] Run all tests automatically (no menu). This is the most 

thorough exerciser. If -ai is specified, the tests run 
once. Otherwise htey run forever; stop by executing 
Ctrl-C, 
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-s 


-h 


-1 testname(s) 


Run software test subset automatically (no menu). 
-3 is generally used only when new software has 
been installed. 

Run hardware test subset automatically (no menu), 
-h is generally used during preventive maintenance, 

Run the tests specified by tcstnamc. See the menu 
illustration below. 


1. basics 

2. data 

3. write 
4.1ink 

5. abs_seek 

6. rcl_seek 

7. dir_basics 

8. many_dire 

9. many_files 

10. serialJo 

11. mixedjo 

12. parallel_partial 

13. transpose 

14. transfer_timing 

15. raw_transfcr_timing 
16.ovcrhcad_timing 

17. max_transfer_timing 

18 . reliability 


test file crcation/dcletion (-s) 

test simple file rcad/writc (~s, -h) 

test writing files (—s, -h) 

test link/unlink (-s) 

lest absolute seek (random) (-S, —h) 

test relative seek (deterministic) (-s, -h) 

lest mkdir/rmdir/chdir (-s) 

test creating many directories (-s) 

test creaiing/delcting many files (-s) 

test serial I/O transfers (-s) 

test mixed serial and parallel I/O (-s, -h) 

test parallel I/O with partial blocks (-s, -h) 

lest transposing serial data (-s) 

test transfer speed (—s, -h) 

test raw transfer speed (-s, -h) 

test overhead speed (-s, -h) 

test max transfer speed 

test reliability (-h) 


Description- 

NOTE: dvtestS has been replaced by dvteatS-sparc (for non-vector- 
unit CM-5 systems) and dvtest5-vu (for CM-5 systems Lhat have vector 
units). 

The dvtest 5-sparc t dvtestS-vu program is an acceptance test that uses 
either an SDA system or an IOBA i'CMIO bus adapter) and a CMFS device to per¬ 
form all I/O functions available to user applications. Among other things, these 
programs test every I/O data and control path, check Ethernet connections, and 
open files and directories (in directory dvtest). 
dvtestS-sparc | dvtestS-vu determine which device to use according to 
Lhe setting of CMFS_PATHTYPE. 
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“ If CMFS_PATHTYPE is set to unix, dvtest5-sparc | dvtest5-vu uses the 
local UNIX or UNIX-compatible file system — the SDA, if the CM-5 system con¬ 
tains one. 

* If CMFS_PATHTYPE is set to cmfs, dvtest5-sparc | dvtest5-vu uses a 
CMF$ file system — e.g., a DataVault, CM-HIPPI, CM5-HIPPI, VMEIO host com¬ 
puter, or CM-IOPG. The program consults DVWD and, if necessary', DVHOSTNAME 
to determine which CMFS device to use. If DVHOSTNAME and DVWD do not define 
the default hostname, the program uses the default CMFS device for the first IOBA 
listed with the kernel. If there arc no lOBAs listed with the kernel,die program con¬ 
sults the configuration file /u.sr/local/etc/dv_hostnaiaa. If that file is mis¬ 
sing, the program uses the CMFS file system device on the local host. 

1 If CMFS_PATHTYPE is Set to mixed, dvtest5-sparc 1 dvtestS—vu checks the 
directory name specified via the -d flag: If the directory name is specified by a 
patlmamc that does not contain a colon (:), the program uses the SDA. If the path¬ 
name does contain a colon, the program checks for a CMFS-hostnamc component 
(i.c,, the string before the colon) of the pathname. If the pathname does contain a 
CMFS-hostnamc component, the program uses that device. If the pathname does net 
contain a CMFS-hostnamc component, the program follows the heuristic for 
CMFS_PATHTYPE = cmfs, as described above. 

* If CMFSJ’ATHTYPE is not set, dvtest5~sparc I dvtest5-vu asks the kernel 
what I/O hardware the system contains. If there is only an SDA, the program uses 
it. If there is at least one CMFS device but no SDA, the program follows the heuristic 
for CMFS_PATHTYPE = cmfs. If there is both an SDA and at least one CMFS device, 
the program follows the heuristic for CMFS_PATHTYPE = mixed. If llic kernel secs 
no I/O hardware, the program follows the heuristic for CMFS_PATHTYPE = cmfs. 


Requirements-- 

dvteatS-spaxc I dvtest 5- vu must be run from a PM th.iL controls a parti¬ 
tion in which 13-daemon is running. For CM-5 systems that contain CMFS 
devices, also make sure that the io.conf file is correct and that the 
f sserver is running in the background on all CMFS data storage devices with 
which dvtest 5-spa ec | dvtest 5-vu will communicate. For CM-5 systems 
that contain an SDA, make certain that the SDA’s SFS file system is mounted. 
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1. 

basics 

test 

file creation/deletion 

2. 

data 

test 

simple file read/write (-a, -h) 

3* 

write 

test 

writing files (-s # -h) 

4 * 

link 

test 

link/unlink {-«) 

5, 

abs_seek 

test 

absolute seek (random) (—3, 

6. 

rel_seefc 

test 

relative seek {deterministic) 




-h) 

7, 

dir_basics 

test 

mkdir/rmdir/chciir (—s) 

8, 

many_dirs 

test 

creating many directories (-•) 

9 4 

inany_files 

test 

creating/deleting many 



files (-■) 

10* 

serial_io 

test 

serial I/O transfers (-s) 

11* 

mixed_io 

test 

mixed serial and parallel I/O 



(-s, 

-h) 

12 * 

parallel_partial 

test 

parallel I/O w/ partial blocks 




-h) 

13 * 

transpose 

test 

transposing serial data (-a) 

14* 

transferetiming 

test 

transfer speed {-s f -h) 

15. 

raw_jtransfer_timing test 

raw transfer speed -K) 

16, 

overh@ad_timing 

test 

overhead speed & r —h) 

17* 

max_trans£er_timing test 

max transfer speed 

18* 

reliability 

test 

reliability (-h) 


Figure 17. The dhrtests menu* 


dvtestS Defaulting Rules- 

In systems with multiple IOBAs, dvtestS applies tlie rules outlined below to 
determine which IOBA and data-storage device to use. NOTE: Other programs 
that use default-device methods to select an I/O device follow these rules as well. 

■ If the environment variables dvwd and/or dvhostname specify a 
hostname, dvtests uses those values to determine which data-storage 
device it will use. It then determines the IOBA it will use by examining the 
Channel_Board m Con£iguration module in /etc/io.Con£ and 
uses the first channel board listed that is on the same bus as the data stor¬ 
age device. If there is no IOBA on the same bus, dvtestS instead uses Utc 
rule explained in the next bullet. See Appendix B for a description of 
io.conf. 
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■ If neither dvwd nor dvhostname are set indicate a target hostname, 
drtests examines the Channel_BoaEd_Configurafci£>ii module in 
io.conf and uses the first channel board listed that has a datastorage 
device on the same bus. If the the bus has more than one data-storage de¬ 
vice, the program uses the device listed first in io.conf's 

IO_devi.ce_configuration module. 

In a standard I/O configuration, these defaulting rules allow all devices to be 
tested via dvwd and/or dvhostnams manipulation. 
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cmdiag (8) 


MAINTENANCE COMMANDS 


cmdiag(8) 


NAME 

/usr/diag/cmdiag - Run CM-5 hardware diagnostics, 

SYNOPSIS 

cmdiag f *P partition-name ] [ [ *g groupname ] I -C ] [ -Ebdfilt ][ +Ebdlilt ] 

DESCRIPTION 

cmdiag is the principle tool for diagnosing hardware problems in the CM-5. cmdiag provides four 
major categories of functional tests: 

JTAG scan tests provide scan access to all internal components of Thinking Machine’s proprietary 
chips and boundary scan testing of all chip inputs and outputs* 

Connectivity tests support connectivity checks between components in the scan chains, including 
connectivity across the control and data networks. 

Processing node tests evaluate ihe functional!ly of die PN circuits, including: the instruction proces¬ 
sor (SPARC chip), vector cxecuLion unit, memory controller, and network interface. 

I/O Processor (IOP) tests exercise the various functions that comprise a CM I/O partition, includ¬ 
ing: the VO clock, I/O control, I/O interface to the dala network, the I/O buffer, and the I/O chan¬ 
nel. 

Verifier tests simulate the kind of activity a user application would impose on the CM. 

All functions and test routines arc accessible via a single user interface* The user invokes Lhe diagnos¬ 
tics from a shell on a CP. Whenever possible, cmdiag should be executed from Lhe CP dial is lhe mas¬ 
ter diagnostic processor (the CP connected by cable to the root node of live diagnostic network). Error 
messages regarding hardware failures arc sent to diag-error-logJiojmome- The section RUNNING 
CMDIAG ON A PARTITION gives a step-by-step explanation about how to run cmdiag. 

cmdiag takes several oplional switches. (Sec the section CMDIAG COMMAND-LINE SWITCHES, 
below.) There are no required swiLchcs, although we recommend running diagnostics on a specific parti¬ 
tion by using the -p switch. Run without the -p switch, cmdiag runs on the entire machine. 

When cmdiag is executed routinely after bringing up a partition, running the groups PE, global, com¬ 
bine, and dr should be sufficient Once a week or so, we recommend running the complete test suite by 
creating a partition encompassing the entire machine and running cmdiag -p -m* Currently this takes 
approximately two hours* 

Executed wiLhout the -p-, -m, -f t or -g switches, cmdiag immediately provides a diagnostic environ¬ 
ment, which is represented by the prompt <cmdiag>. This diagnostic environment supports a set of 
diagnostic-related utilities and commands as well as the individual tests that comprise the predefined 
diagnostic test groups. (The utilities, commands, tests, and test groups arc listed in the section CMDIAG 
TESTS AND COMMANDS.) To exit the diagnostic environment, type exit at the <cmdiag> prompt- 


CMDIAG COMMAND-LINE SWITCHES 
The switches are as follows: 

-p runs diagnostics on the specified parttion, 

-m executes diagnostic manufacturing tests of CM-5 system components. 

-f executes diagnostic field tests of CM-5 system componenis. Field tests are a subset of the 
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manufacturing tests. 

-g executes tests for groupname only. For a list of groupnames, sec the section CMDIAG 

TESTS. 

-C enables command completion within diagnostics environment. 

+E activates diagnostic environment options: 
b - Break on error (default) 
d# = Display error counL (default = 16) 
f = Loop forever 
i = Ignore errors 
1 = Log errors (default) 
t = Display trace 

-E Deactivate diagnostic environment options (see +E). 


RUNNING CMDIAG ON A PARTITION 

1. Execute cm partition stop to halt the timesharing daemon running on the partition. 

2. Reset the partition’s registers and switches by executing /usr/diag/emreset 

3. Reset the interface to the partition manager by executing /usr/diag/emreset -s. 

4. Check that the pertinent environment variables are set correctly (see the section ENVIRONMENT 
VARIABLES). In particular, if you must run the diagnostics from a CP that is not the system 
console/mastcr diagnostic processor, be sure the JTAG_SERVER environment variable is set appropri¬ 
ately. 

5. (This step is necessary only if the hardware has changed, requiring an edit of 
etc/cm/configuration/hurdware,install.) Check the directory defined by the CMDIAG_PATH environ¬ 
ment variable to sec if there are any files that must be deleted, Delete all files whose names contain the 
hostname of the CP from which you are executing cmdiag. 

6. Execute cmdiag. Usually, running a few test groups via the following syntax is sufficient: 

syscon% /usr/diag/cmdiag -p partition-name *f -gPE -ggtobal -gcombine *gdr 

Analyze any failure reports; descriptive error messages arc sent to d iag-er ror-log./ior/name in your 
current directory. Rerun any appropriate tests. 

7. Delete diag-error-Iog./iojtname when its contents arc no longer needed. 


CMDIAG TESTS AND COMMANDS 

This section lists the names of die tests and commands that cmdiag can run. The first subsection 
categorizes the tests by groupname (see the -g switch, above). The second subsection lists all tests and 
commands categorized by which part of the machine they serve to diagnose. 


Test Groups 

Group : SVME (Tests SVME hardware.) 

1. test-svme-serial-data-path 

2. test-svme-id-prom 

3. tcst-svme-ni-latch-drive 

4. lesi-svmc-ni'latch-sample 

5. test-svmc-ni-chip 
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cmdiag(8) 


MAINTENANCE COMMANDS 


cnndiag(S) 


Group ; SN1 

1. tcst-sni-scrial-dala-path 

2. icst-sni-id'prom 

3. [csi-sni-icd-reg 

4. tcsi-sni-ni-chip 

Group : CLKDN 

1. lcsi-dkdn-serial-data~paih 

2, lesuclkdn-analog-cnv-data 

3. icsL-cIkdn-analog'env-control 

4. test-dkdn-digital-env-data 

5, test-clkdn-csr 

6, lest-clkdn-id-prom 

7. icst-clkdn-pll-comrol 

8, icst-dkdn-pod-status 

Group : CLKBUF 

1. tcst-clkbuf-seriabdata-path 

2. tcst-dkhuf-analog-env-daia 

3. Lost'Clkbuf-analog-cnv-coiUrol 

4. icst-dkbuf-digital-envdata 

5. lest-dkbuf-csr 

6. lest-dkbuf-pod-stams 

7. tcst-dkbuf-id-prom 

Group : IOCLK 

L icst-ioclk-scrial-data-paih 

2. icsidodk-analog-cnv-data 
3- tcst-ioclk-analog-cnv-control 
4, tcst*iodk-digiiabcnv-data 
5- tcst-iodk-id-prom 
6, tcsi-ioclk-cn-switch 
7* tcst-ioclk-cn-chip 

Group : SASYS 

1, icsi-sasys-scrial-data-path 

2, tcsi-sasys-analog-cnv-data 

3, icsi-sasys-analog-env-control 

4, lest-sasys-digitabenv-daia 

5, test-sasys-csr 

6, IcsL-sasys-pod-stalus 
1\ icst-sasys-pll-conlrol 
8* tcst-sasys-cn-chip 

9, tcst-sasys-dr-chip 

10. icst-sasys-drivc-sync-control 

11, icst-sasys-drive-sync-data 

12. icsi-sasys-id-prom 

Group : SPI 

1. Lcst-spriscriabdata-path 

2. tcst-spi-id-prom 

3. lest-spi-dr-chip 

4. icst-spi-cn-chip 
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cmdiag{8) 


MAINTENANCE COMMANDS 


cmdiag (8) 


Group : DR 

1. test'dr-serial-data-path 

2. tcst-dr-id-prom 

3. test-dr-dr-chip 

4. iest-drS-scrial-data-path 

5. iest-dr5-id-prom 

6. test-drS-dr-ehip 
7* test-dr-dr-bsr 
8. tcst-afd-bsr 

Group : CN 

L testca-serial-data-path 

2, test-cn-switch 

3. test-cn-id-prom 

4. tesl-cn-chip 

5, LesLen-en-bsr 

Group : FILLER 

L tesLflller-scrial-data-path 

2. test-fiIIcr-dr-chip 

3. tcst-fillcr-id-prom 

Group : CMPE (Tests ability of PNs to interact with NI, DR, and CN hardware-) 

1. test-pc-scrial-data-path 

2. lesLpc-ni-chip 

3. tcst-pc-dr-chip 
4* test-pc-cn-chip 

5. tcst-pc-id-prom 

G ro up : PE MEM 

L tcst-pemcm-scriabdala-path 
2* test-pcmem-mc-chip 
3, test-pemem-id-prom 

Group : IOCNTRL 

L test-iocntrl-sc rial-data-path 
2* tcst-iocntrl-id-prom 

3, tcst-iocntrl-mc-chip 4. tcst-iocntrl-ni-chip 

Group : IODR 

1. test-iodr-scrial-data-path 

2 . tcst-iodr-dr-chip 

3. test-iodr-id-prom 

Group : IOBUF 

L test-iobuf-id-prom 

2, test-iobuf-ni-chip 

3. test-iobuf-pbus-buffer 

4* test-iobuf-scrial-data-path 
5* test-iobuf-xbus-buffer 

6. test iobuf-xbus-data-in 

7. test-iobuf-xbus-data-out 
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cmdiag (8 ) 


MAINTENANCE COMMANDS 


cmdiag (8) 


Group: IOCHNL 

1. test-iochnl-cmio-cntrl-out 

2. icst-iochnl-cmio-cmrl-in 

3. test-iochnl-id-prom 

4. tcst-iochnl-pbus-buffer 

5. (cst-iochnl-rcsponse-data 

6. tcsi-ioehnl-scrial-data-paih 

7. tesi-iochnl-xbus-buffer 

8. tcst-iochnl-xbus-data-in 

9. tcst-iochnl-xbus-data-out 

Group : SAC 

1. tcst-sac-scrial-data-path 

2. test-sac-mc-chip 

3. test-sac-ni-chip 

4. ic$t-sac-id-prom 

Group : SADR 

]. tesL-sadr-serial-data-path 

2. test-sadr-dr-chip 

3. tcsl-sac-id-prom 

Group : IOP-CNTRL 

1. reset-system 

2. jn ilializc- pc-memory 

3. load-sccondary-boot 

4. ctcar-pc-memory 

5. load-iopcturl-tests 

6. cxcculc-all-iopcntrl-tests 

Group: IOP-BUF 

1. reset-system 

2. initializc-pc-memory 

3. load-secondary-boot 

4. clcar-pc-memory 

5. lood iopbuf tests 

6. cxccutc-all-iopbur-tests 

Group : IOP-CHNL 

1. reset-system 

2. inilializc-pc-mcmory 

3. load-sccondary-boot 

4. clcar-pc-memory 

5. load-iopchnl-tests 

6. cxccme-all-iopchnl-tests 

Group : IOP-SYS 

1. reset-system 

2. initial izc-pe-mcmory 

3. load-sccondary-boot 

4. clcar-pc-memory 

5. load-iopsys-tests 

6. cxccutc-all-iaopsys-tests 
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cmdiag(8) 


MAINTENANCE COMMANDS 


cmdiag(8) 


Group : PE 

1. reset-system 

2. test-jtag-backdoor-connection 

3. test-jtag-backdoor-interrupt-clear 

4. tes t-j tn g-backdoor-rcquest-dear 

5. tcst-jiag-backdoor-command-channcl 

6. tcst-mc-registcr-read 

7. tes t-c m u-boor- mode 

8. lest-broadcast-intcrrupt-reccive 

9. test-me-reduce 

10. initializc-pe-memory 

11. Joad-sccondary-boot 

12. test-cmu-run-modc 

13. clear-pc-mcmory 

14. test-emu-run-mode 

15. router-inii 

16. tcst-pc-memory 

17. load-file pcicsis 

18. test-emu-run-mode 

19. lesi-mc-rcduce 

20. execute-all-pc-tests 

21. tcst-cmu-boot-mode 

Group : global (Verifies CM-5’s ability to perform global communication operations.) 

1. rcsct-and-load-for-tcst-group 

2. tcst-cn-async-global-supcrvisor 

3. tcst-cn-async-global-user 

4. test-cn-sync-globa) 

5. tcst-cn-sync-global-roll-call 

Group : broadcast 

1. rcsct-and-load-for-tcst-group 

2. test-broadcast-scalar-scnd-cnablc 

3. test-broadcast-scalar-supervisor 

4. test-broadcast-scalar-user 

5. tcst-broadcast-pn-supcrvisor 

6. tcst-broadcast-pn-user 

7. Lcst-broadcast-intcnupt-scalar-send 

8. LesL-broadcasl-intcrrupt-pn-send 

Group : combine 

1. reset-and-load-for-tcst-group 

2. test-combine-pn-data-is-one 

3. lest-combine-pn-data-is-zcro 

4. lest-combinc-pn-multiword-carry 

5. test-combine-pn-multiple-stackcd-scan 

6. test-combine -pn-overfi ow-deteoion 

7. test-combine-pn-segmenied-scan 

8. iesi-combinc-reduce-lo-scalar 

9. test-combine-int-on-rcc-ok 

10. icst-combine-(lush 

Group : dr (Verifies CM-5’s ability to transfer messages acruss die data network.) 
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cmdiag <8) 


MAINTENANCE COMMANDS 


cradiag(S) 


1. reset-and-load-for-icst-group 

2. test-dr-scalar-to-pc 

3. test-dr-tag-scalar-send 

4. tcst-dr-prMG-scalar 

5. test-dr- tag-pn-scnd 

6. test-dr-length-scalar-scrid 

7. test-dr-length-pn-scnd 

8. test-dr-pn-stalic-send 

9* [esLdr-pn-dynamic-scnd 

10. tesHir-fiow-control-pn-io-pn 

11. test-dr-rec-stop 

12. icsLdr-afd-rouier-cmpiy-pn 

13. tcst-dr-afd-rouicr-empty-scalar 

14. test-dr~afd-router-ftiIl 

15. test-dr-int-rcc-ok-scalar 

16. tesLdr-int-rec-ok-pn 

Group : partition {Verifies CM-S’s ability to perform global, broadcast, combine, and 
within a parlition.) 

1. rcsct-and-load-foMcst-group 

2. icst-partilion-global-scalar-stalic 

3. icst-partiiion-global-pc-siatic 

4. lest-partUion-globaUdynamic 

5. test-partiiion^combine-scalar-staiic 

6. lest-panition-combinc-pe-stadc 

7. test-part it ion -co m bi ne-dy nam ic 

8. test-part it ion -broadcas t- scalar-s m li c 

9. tcst-partition-broadcasL-pe-sLUic 

10. test-partmambroadcasLdynamic 

11. tcst-partition-dr-scalar-afd 

12. test-pariition-dr-pc-afd 


Com prehen she List of Commands and Tests 

--General Utilities— 

con li nuc- from - a bort 
list-commands 
script 

set -d iag-c n v iron men t 
show-diag-environment 

-JTAG Utilities- 

find-cm -e rr or find-cm - pn-error 

fmd-cm-cn-cJTor 


-JTAG Status Commands- 

si j u w -al 1-pe- su* tus sho w-pc-stat us 


alias 

help 

run -groups 

setenv 

shell 


DN operations 


getenv 

list-groups 

silent-script 

show-all-errors 

whatis 


find-cm-dr-crror 


show-spi-simus 
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cmdiag (8) 


MAINTENANCE COMMANDS 


cmdiag (8) 


show-cn-status 


configurc-aH-pcs 

configurc-all-cns 

configurc-dr 

cnablc-auto-rcsct 


cl dsc <m i o-d i ag*connec ti on 
c m i ohipp i -dat a-pattc m- x fer 
cmS-writc-datavault 
cm5-read-vmcio 

cmio-vmcia-iope-all-pauern-xfcr 

cm i o-vm do -m cmory- iopc- a 11 - pa ucm -x fer 

cm io-vmcto-lope-all-mcxlc-xrcr 

cmiohippi-sm-src-dsi-sm-ifLcld-xfcr 

cm toh i ppi-sm - i op-sre - dsi-sa mciop - sm-x fer 

crcat-cmiohippi-comparsion-data 

gct-cmioc-bus-id 

selcctmcxucmiohippi iop-omcmio-bus 
selcct*cmio-server 
set-cmiohippi-eheek-pariiy 
show-corruptcd-daia-on-hippi 


show-dr-status 


-JTAG Config Commands- 

coeifigure-pe 

configurc-cn 

configure-filler 

reset-quad 

-CMIO HIPPI Commands- 

cm io h ip pi -c m io-data-albpath- xfer 

cm io-vm c i o-memory-iopc-a U - mode - x fer 

cmio-dv-iopc-all-pauem-xfcr 

cmichippi-sm-iop-src-dst-difftop-sm-xfer 

cmio-vmeio-iope-xfcr 

cmio-vmeio-memory-iopc-xfcr 

cmiohippi-sclect-pons 

cmiohippism-src-dsi-sm-xfer 

cm io h i p p i - sm - iop-sre-s m - xf e r 

data-xfcr-on-aU-iobuf-chnl 

gct-iope-config 

rcset-cmiohippi 

se L- cm io h ipp i -ar bi ter 

sct-iop-buffer-and-chnl 

test -c m io- dc v i cc-data -x fc r 


add-multi-chip-sample 
insianiiatc-mulli-chip-scan 
nb-scan-inpod-udr 
samplc-pod-udr 


autosize 
jiag-rcsCL 
server-connect 


build-autos! zing-file 
eheck-count*and-first 
c ompare - a u tos iz ing-fi le 
load-diag-partilion 
rcad-alTpod-id-prom-m-partilion 
show-diag-parti don 


-JTAC Scan Commands- 

add-mu I Li-c hip-scan 
multi-chip-sample 
read-id-prom-by-pod-name 
sea n - in -pod -reg istcr 

-JTAG DN Commands- 
dn-rcsct 

jiag-run-Lesl-idle 

scrvcr-disconnect 


-JTAG Equip, Set Commands- 

build-cbs-diag-paniLion 

dicck-cquipmcni-set 

generat-na-partkion 

priru-all-pod-id-prom 

show-all-pod-na-Ust 

show-hlr-parlition 


rfiow-sv me- status 


configure-spi 

configurc-all-drs 

configjre-svme 


cmiohippi-cmio-daui-xfcr 

cmiohippi-coldboot 

cmio-dv-iopc-xfer 

cmS-writc-vmcio 

cm5-rcad-datavault 

cm5-rcad-hippi 

c m ioh ippi- sm-ds t-iop-sm -x fi 

cmS'Wrke-hippi 

cm iohippi-standlone-tesis 

establish-cmio-diag-connccu 

get-stebon-ids-on-bus 

init-c mio-diag-environmem 

sct-cmiohippi-check-data 

show-cmiohippi-diag-enviro 


instanLiaie-multi-chip-sampli 
mulli-chip-scan 
sampb-pod-registcr 
scan-ja-pod-udr scan-dr 


dn-resct-io-dcar 

phoenix-ir-icst 

tcst-dn-channel-reg 


build-equipmem-sct 

clcar-diag-partiiion 

load-autosmng-filc 

read-a 1! - pod-i d-prom 

show-auiosizing 

walk-itdn-trce 
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cmdiag (8 ) 


MAINTENANCE COMMANDS 


cmdiag (8) 


sclcci-fillcr 

tes t-fi 1 ler-serial-data-pa Lh 


reset-sasys 

selcct-sasys 

test-sasys-cn-ehip 
tesl-sasys-dr-chip 
test-sasys-id-prom 
test-sasys-scrial-data -path 


select-sadr 

tes l-sa d r-sc ria 1 -da ta-pai h 


selcct-sac 

tesl-sac-ni-chip 


rescl-ioclk 
selcct-ioclk 
tcsi-ioclk-cn-chip 
tcst-ioclk-digi lal-env-data 
test-ioclk-scrial-data-paih 


selea-iodr 

tcst-iodr-scrial-daia-path 


seicet-iochnl 

test-iochnl-id-prom 

test-tochnl-serial-data-path 

test-iochnl-xbus-data-out 


--JTAG Filler Commands- 

lcsl-fillcr-dr-chip 

selcet-alt-fillers 


-JTAG SASYS Commands- 

samplc-sasys-csr 

tcsi-sasys-analog-env-comrol 

test-sasys-esr 

test-sasys-drive-sync-conirol 
icst-sasys-pll-control 
wr i te- sasy s-pod -s tatu s 


-JTAG SADR Commands- 
icsi-sadr-id-prom 


-JTAG SAC Commands- 

tcsl-sac-id-prom 

tcst-sac-scrial-data-palh 


-JTAG IOCLK Commands- 

samplc-ioclk-csr 
tesL-ioclk-analog-env-conirol 
tcst-ioclk-cn-switch 
tcsi-ioclk-id-prom 
« ri ic- ioc 1 k -pod - sia i u s 


-JTAG IODR Commands— 
test-iodr-id-prom 


-JTAG IOCHNL Commands- 

icst-iochnl-cm io-cmrl-oui 

tcsi-iochnl-pbus-buffer 

icsl-iochnl-xbus-bufler 


icsl-fillcr-id-prom 


samplc-sasys-pod-siatus 
icsi-sasys-analog-env-daia 
ic st-sasy s -digital - e nv-dain 
icst-sasys-drive-sync-daia 
tcsi-sasys-pod-staius 


tcsi-sadr-dr-chip 


lest-sac-mc-chip 


sample-ioclk-pod-status 
tcsi-ioclk-analog-env-daia 
test-ioclk-csr 
lest-ioclk-pod-status 


tesi-iodr-dr-chip 


tcsi-iochnl-cmio-cmrl-in 

tesi-iochnl-responsc-daia 

tcsl-iochnl-xbuS-daiain 
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cmdiag (8) 

MAINTENANCE COMMANDS 

cir,diag(8) 


—JTAG IOBUF Commands- 


select-iobuf 

tes t-iob u f-id -prom 

Icsi iobuf m chip 

test-iobuf-pbus-buffer 

test-iobuf-serial-data-paih 

test-iobuf-xbus-buffer 

icst-iobuf-xbusdala-in 

tesl-iobuf-xbus-data-oul 



-JTAG IOCNTRL Commands- 


selccl-iocniri 

tcst-iocrUrlddprom 

tcst-iocntrl-mc-chip 

test-iocnirl-ni-chip 

test-iocntrl-serial-data-path 



—JTAG CLKBUF Commands- 


reset-el kbuf 

samplc-clkbuf-csr 

sampb-clkbuf-pod-siatus 

sclcct-clkbuf 

test-clkbuf-analog-env-control 

tes t-c lkb u fan alog-c n v- da Ui 

test-clkbuf-csr 

tcst-clkbuf-digital-env-data 

tcst-clkbuf-id-prom 

test-clkbuf-pod-status 

tcst-clkbuf-serial-data-paih 

write-: Ikbuf-pod-status 


—JTAG DR5 Commands— 


selecl-dr5 

test-dr5-dr-chip 

test-dr5-id-prom 

tcst-dr5-scrial-daia-paih 

-JTAG DR Commands— 


select-dr 

sclect-all-drs 

test-dr-dr-chip 

test-dr-id-prom 

tcst-afd-bsr 

icst-dr-scrial-data-path 

test-dr-dr-bsr 


-JTAG CN Commands- 


sdect-cn 

sclect-all-cns 

tcst-cn-cn-bsr 

test-cn-chip 

tcst-cn-switch 

icst-cn-id-prom 

test-cn-scrial-data-path 


-JTAG PEMEM Commands- 


selcct-all-pcmems 

selccl-pcmem 

icst-pi-pemembsr 

tes t-pemem -i d -prom 

icst-pcmcm-mc-chip 

tcst-p;mcm-scrial-data-patli 


-JTAG PE Commands- 


selcct-all-pcs 

selcct-pe 

test-p:-cn-chip 

tcst-pc-dr-cliip 

icst-pc-id-prom 

tcst-pe-ni-chip 


tcsi-pc-scrial -data-path 
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cmdiag {8) 


MAINTENANCE COMMANDS 


cmdiag (8 ) 



—JTAG CLKDN Commands— 


rcsct-nel-c lock-switch 
sample-clkdn-pll-control 
rcsct-all-clkdns 
icst-clkdn-analog-cnv-control 
lest-clkdn-digi tal-cn v-data 
tcst-c Ikdn - pod-status 

resct-clkdn 

sample-clkdn-pod-status 

sct-rikdn-pll 

tcst-clkdn-analog-env-data 
tcsi-clkdn-id-prom 
icst-clkdn-serial-data-path 

sample-clkdn-csr 

seiect-clkdn 

set-n ct-c lock-switch 

tesl-clkdn-csr 

test-clkdn-pll-control 

write-clkdn-pod-status 


-JTAG SPI Commands- 


sclect-spi 

test-spi-id-prom 

tcst-spi-cn-chip 

lest-spi-serial-data-path 

test-spi-dr-chip 


-JTAG SN1 Commands- 


scicct-sni 

lest-sni-ni-chip 

test-sni-id-prom 

tcst-sni-serial-data-path 

test-sni-lcd-rcg 


—JTAG SVME Commands— 


connect-svme 

read-word 

test-svmc-ni-chip 

lest-s v m c -serial - data-path 

read-byte 

sclcct-svme 

test-svme-ni-laich-samplc 
write-long 

read-long 
test-svmc-id-prom 
tc st-sv me- n i -latch-dri vc 
write-word 


w r -rd - ver-d n -c han nel-reg 



-PM Diag Utilities- 


verbose 

disable-pm-board 

map-in-pe 

mapout-pe-board 

rcad-m-rcgisier 

req uest-c om bine - d ump 

reset-system 

safety 

cnabtc-pc 

map-out-pe 
map-in-pe-backplanc 
writc-ni-register 
r eques uleft-rou ter- d u mp 
rcsct-svme 

enable-prri’board 
disable-pc 
map-in-pc-board 
map-out -pe- bac kplane 
request -bac kdoor- dump 
request-riglu-router-dump 
quit 


-PM Diag Router Ulililies- 


router-inil 
dump-chunk-table 
sc n d- le fl-rou ter- m es sage 

sct-sclf-address 
rcad-mcmory-using-ldr 
send-right-rou ter- me ssage 

chcck-sclf-address 

read-memory-using-rdr 


-PM Diag Tests-- 


imtialize-pe-mcmory 

elear-pc-memory 

load-sccondary-boot 
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cmdiag (8) 


MAINTENANCE COMMANDS 


cmdiag (8) 


test-pe-memory 

load-iopbuf-tests 

load-iopsys-tests 

exec u le^single- pe- tests 

execute-ali-topbuf-tcsts 

exccute-all-iopdv-tcsis 

load-file 


test-j tag -back door-e onne clion 

lest-jlag-backdoor-command-channel 

test-mc-reduce 


reduce-memory 

wriLfMnemory 

red uce-mc-register 

write-mc-registcr 

read-mc-prom 

dump-pc-prom 

read-emu-register 

reduce-io 

writc-io 

set 

write-double 

select-buffcr-board 

set-staLion-id 

loop-read 

loop-write-read 

read-pes-backdoor 

ex tr ac t- m essage - buffcr-rdr 

call-sing le-pe-func tion 

call-diag-funciion 


broadcast-uscr-data 

monitor-sbc-receive 

pc-read-memory 

pe-wri te-n i-reg i stcr 

send - data-to-o th e r- node 

show-all-acc css iblc -reg- nam cs 

$ho w-c hu nk- tabl c -d a ta 

tell-pe-to-eombinc 

tcll-pc-to-drain-dr 

tfill-pe-to-rcad-broadcast 

tcll-pe-to-scnd-dr 

v fr-s et up-address-tablcs 


load-pe*tesls 

load-iopchnl-tcsts 

load-iopdv-tcsts 

execute-pe-tcsts 

execute-all-iopchnl-tests 

ex ec u le-all - ioppe-tes is 

load-iope-file 

--PM Diag Config Tcsts- 

test-j tag - backdoor- in term pt-clear 

tea-mc-register-read 

test-emu-boot-mode 


-PM Diag Debug Commands- 

SC l-memory 

dump-tnemory 

sc t-mc-register 

dump-mc-registcr 

d um p-mc-prom 

rcduce-cmu-register 

writc-cmu-rcgister 

setrio 

dump-io 

write 

read-double 
sc I cct-c ha nncl-boar d 
seim-iop-ver-data 
loop-write-double 
loop-writc-read-dotiblc 
extraci-mcs sage-buffer 
d um p-cm u - re sci-siate 
loolcup-pe-symbol 
] ook u p-d i ag-s y mbo l 

—Verifier Support Functions- 

broadca st- s □ per vi sor- data 

monitor-dr-reeeivc 

pc-rcad-ni-registcr 

pc- wri tc-n i* register* fast 

sct-uscr-symbois 

show-all-user-symbols 

show-scalar-m-rcgistcr 

tell -pc - to-drdoop- dra in 

tcll-pc-to-fill-dr 

tell-pc > to-read-combi ne 

vfr-diagnose-dr-pc-to-pc 

vine-hardware- dc b ugge r 


load-iop-tests 
load-iopcmrl-tests 
load-ioppe-iests 
cxecu te -a ll-iopcn tr 1 - tests 
cxccute-all-iopsys-tests 
cxccuts-all-pc-tcsts 


test-j tag-backdoor-request-cl 
test-broadcast-interrupt-recei 
lest-emu-run-mode 


read-memory 

load-memory 

rcad-mc-register 

reduce*mc-prom 

diff-pc-prom 

set-emu-register 

dump-emu-register 

rcad-io 

reduce 

read 

dump 

sct-start-block 
toop-write 
loop-read-double 
write-pcs-backdoor 
ex trac :* message-bu ffer-ldr 
c m p-use - control-ne t 
foad-pe-emp-map 
load-symbol-table 


monitor-bc-receive 
pc-cxiract-mcssage 
pe-wri te-memory 
q ucry -al I -pc-e rror-sta tu s 
setup-pe-address 
sbow-pc - me mory 
icll-pe-to-broadcast 
idf-pe-to-dr-loop-scnd 
tcll-pe-to-chcck-flow 
tcll-pc-lo-read-dr 
v f r- m ake- pc- send - d r 
wri tc - sc al a r- ni -reg i ste r 
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just-load-no-resct 

reload-ndiag-pemon 

run-alhvfr-tcsts 


run-all-broadcasHests 
test-broadcast - sea 1 ar- se nd-enabl c 
lest-broadca st - pn -supervi sor 


ru n -all-global - tests 
tcst-c n-asy no global- user 


lesi-combine-im-on-rcc-ok 

tes t-com b ine-pn * m u 11 i plc-s ta eked-scan 

icst-combi ne - pn -seg m ented - sea n 


run-all-dr-tests 

tes t-dr- afd -router- empty- scalar 
tes t-dr- i n t ree -ok - seal ar 
test-dr-length-pn - send 
tesEdr-pn - to-seal a r 
tes t-dr- tag- scalar* send 


test-par tilion-broadeasi-dynamic 

tcst-partinon-combine-scalar-static 

tcst-partitiort-dr-pc-afd 

les t-parti ti o n-g I obal-s ca lar-s mi ic 


ni-access-tesL-imcrrupt-reg 
ni-accessTcst-rcg-after-reset 
ni-acecss-LesL-writablcTields 
ni - broad cast-tc s t-s i n g !e-w ord 


—Verifier Init Function-- 

ioad-ch unk- table -data 
rcsct-and-load-for-icst-group 


—Verifier Broadcast Tests- 

tcsl-broadcast-imcrrupt-pn-send 
test-broadca st- sc alar- s u pervisor 
tcst-broadcasi-pn*use 


-“Verifier Global Tests- 

vfr-diagnose-async-global 

tcst-cn-sync-global 


-Verifier Combiner Tests- 

vfr-diagnose-combinc-rcducc-to-scalar 
test-com b i ne -p n-data- i $ -one 
tcst-combine-flush 
tcst-combine-rcduccTo-scalar 

-Verifier Data Rouicr Tcst$™ 

[est-dr-afd-rouLer-full 
tcst-dr-fiow-control-pn-lo-pn 
tes t-drrint-rec-ok - p n 
test-dr-pn-dynamic-send 
tcst-dr-rec-siop 


—Verifier Partitioning Tests- 

tes t-pani lion - broad ca si -pe - stati c 

tcst*partition-combine-pe-static 

tcsi-partition-globabdynamic 


-Verifier SVME Board Tests- 

n i - acc css - tes t- re adablc - wri table-rcg 

ni-acccss-tcst-all 

ni-broadcast-full-tcst 

ni - broadcast-lcs t-w ri ic-rfi fo 


load-ndiag-pcmon 

restart-ndiag-pcmon 


test-broadcast-interrupt-scak 
test-broa dc a s t- seal ar-u sc r 


v fr-d i agnosc -sy nc-gl oba l 
tcst-ca-sync-global-ro!l-call 


vfr-diagnosc-combinc 
se st-com b i ne -pn -da ta- i s -zc ro 
te st-com b i nc-pn-o vc r fl o vv-dt 
test-combi nc-pR-multiword-( 


test-dr- a fd-rou ter-e m p Ly -pn 

tes t-dr-len g th -scalar-sc n d 
tes Udr-p n - static - se n d 
test-dr-seala r- to- pc 


tes t-parti t ion - broadcas t - sea k 
tcst-paitition-dr-scalar-afd 
test-parti tion-gtobal-pc-smiit 


n i -access - te st- rec-fi fo 
n i -acce SS - le st- send - fi fo 
ni-broadcast-Lesi-rec-absiain 
n i -c om bJ ne-tes t-lcga I -pa i ic rr 
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cmdiag (8) 


n i -co mb ine - test- illcga bpaucms 

set- vm e-in t-en ab Ic_bi t 

test-all-m-registers 

test-data-reg-access 

tcst-int-force-on-off 

lesi-ni-wordQ-latch-aecess 

test-register 


sa- au to-rcsct-pan i t i on 
sa-forcc-sync-global -complete 
sa-test-opcn 
sa-test-isolate-dr 
sa-icst-disconnect-cn 


rcset-vme-im-cnablebit 

sho w-reg -tcst-rcs ult 

test-all-vme-imcrrace-rcgistcrs 

tcst-dn-parenMeg-access 

icsi-ni-prcsencc 

ic st-n i- word 1 - f ate h -access 


--SA Library Interface Tests— 

sa disable-control-net 
sa- s ci-al 1-eo m- i 1 u sh - se n d 
sadesLclose 
sa tcst-connect-dr 
sa^ tes t- ge t-c om pone n is 


set-test-aitribuie 
test-all-registcrs 
tes£*comnmml-reg-access 
tesi-dn-child-reg-acccss 
tc si-ni-rcsct-c ondi ti on 
tesLreset-reg 


sa-disablc-controbnct 
sa-s e L-al I - c o m - co nirol 
sa-tes'-reset-parution 
sa-tes:-conncct-cn 


ENVIRONMENT VARIABLES 

Set Lhc environment variables below for the current system configuration. In most cases tire default 
values will be correct, 

SVMEDEV number 

This variable tells the device driver which SVME to talk to. A DIP switch setting on the SVME 
board determines the value of number, which can be 0, 1, 2, or 3. For example, if the DIP is set to 
00001000 (where 1 is up) then the board is SVMEQ. The default is 0. 

CMDIA G_PATH pathname 

This variable tells cmdiag where to find various descriptors and the files it uses. The default is J . 

PM OBJECT PATH pathname 

Tliis variable tells cmdiag where to look Tor the files that will be downloaded into each processing 
node, The default is ./object/, 

SVMEJRESET 

This variable defines whether the SVME board is reset upon execution of cmdiag. The default is no 
reset, 

JTAG SERVER hostname 

This variable tells cmdiag where the jtagserver is running—usually on the system consolc/master 
diagnostic processor. Selling this variable is not required if you arc running emdaig from the master 
diagnostic processor, 

JTAG_RESET_FILE filename 

filename specifics the reset script used to do a reset. The filename that is in effect at system installa¬ 
tion should not be changed. 


RESTRICTIONS 

cmdiag and the timesharing daemon cannot run on the same partition. 
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NAME 

Aisr/etc/cm parti lion — partition Connection Machine (CM-5, CM-5*LD) hardware resources 
SYNOPSIS 

empartition list [-1] 

cmpaniiion create [-pm hostname] [-name partUionjname] 

(-size n I -pn_range range [-pn_rangc range]) 

[-description partitionjiescription] 

[-iop integer_address] 

empartition start [-pm hostname] [-name partiihnjtame ] 

[-n integer] [-reva] 

*Cmd commandjtame command_argl ,xommand_argn 
empartition stop [*pm hostname] [-name partiiionjiame] 

empartition delete [-pm hostname] [-name panitionjiame] 

DESCRIPTION 

empartition is the principal system administration interface for configuring the CM-S and CM-5-LD 
processor and network hardware into usable resources known as partitions. 

Partitions are mutually disjoint subsets of the Connection Machine hardware that execute 
independent copies of CMOST, the Connection Machine operating system, CMOST in turn 
schedules and manages all user processes within the parti lion. 

A partition is minimally defined by a single control processor designated as the Partition 
Manager (PM) and a set of parallel processing nodes (PN's), plus the nodes of the control net¬ 
work (CN) that link all of the processors into a common communications domain. Once this 
set of connections is created among Lhe specified processors, it persists until it is explicitly 
deleted or the hardware is reset. Typically, partitions are tom down and recreated on the order 
of a few times a day. 

In order for the PM to make use of the processor nodes assigned to its partition* it must notify 
its copy of the CMOST kernel of their number and locations. It must then download the kernel 
image to be run on the parallel processors in the partition. Lastly, it must start up the 
timesharing access mode for user programs on the PM. [Note: Currently the starting and stop¬ 
ping of a particular partition requires direct access to the CMOST kernel on the partition 
manager; thus these operations must be performed on the partition manager itself using the rsh 
command from the system console.) 

empartition is comprised of a set of five commands that control the various aspects of partition 
management. Only one of the commands -- empartition list — can be executed without root privcleges. 
The empartition commands arc: 

empartition list 

This command prints out on the standard output a short description of the Connection Machine 
hardware, followed by a short list of all currently configured partitions and their attributes. This 
is the default subcommand; that is, empartition is equivalent to empartition list. 

-1 

Prints an expanded list of partition attributes, 
empartition create 

This command allocates and reserves Connection Machine resources for the new partition by edit¬ 
ing the file /etc/cm/configuration/partitions.current. To bring up a partition, it must be both 
created and started. The empartition create command must be called from the system console. 

-pm hostname 

The hostname of the unique control processor associated with die partition. Tins con¬ 
trol processor will be the pariiiion manager for this partition. If this switch is not 
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included on the command line* by default the partition's PM is the control processor 
on which empartition create is executed 

-name partition jiame 

A unique name for the partition* There is no default value* 

-description parlitbn_description 

A string that tells users about the partition* The description is included in the output 
of empartition list -I. 

-size n 

The number of PN's to be configured in the new partition. The value n must bo less 
than or equal to the total number of available PNs, Currently this switch assigns a 
range that begins at PN address 0* This switch cannot be specified on the sane com¬ 
mand line as the -pn_range switch* 

-pn range range 

A range of PN addresses of the form x-y t specifying the first and last address in this 
range. The pn rangeR range switch can be specified more than once, although 
ranges cannot overlap. Use pn range to configure partitions more precisely than the 
-size switch allows. This switch cannot be specified on ihc same command line as 
the -size switch* 

-iop integer_addres$ 

Supported for the CM-5 only, this switch specifies an I/O processor to be associated 
with the partition being created. This argument is needed to support eroding tests that 
involve I/O* integerjiddress is the I/O processor's ncLwork address. 

cm partition start 

This command initializes the partition configured for the specified partition manager and starts up 
the timesharing access mode on that partition manager* After the empartition start command is 
executed, users can run programs on that partition, empartition start must be called from the PM 
of the partition you wish to start (usually via rsh from the system console)* 

-pm hostname 

The hostname of the unique partition manager of the partition to be started* If neither 
this switch nor -name is included on the command line, the partition started is the 
one managed by the PM on which empartition start is executed* 

-name partition_name 

The partition's unique name, given by empartition create. There is no default value, 
-n int 

The number of times the timesharing daemon (is-daemon) should automatically be 
restarted upon failure. Default is 10* 

-reva 

Notifies the operating system that some or all of the processing nodes' NI chips are 
revision A chips, which require special handling. The -reva switch is required if Lherc 
are any revision A chips in ihc system; if there are no revision A chips (the usual 
case), do not specify -reva. 

To determine the revision status of your CM-5 system's NI chips, examine the output 
of the command demni (executed on a CP): chips marked Phoenix are revision A, 
while chips marked Phoenix It arc revision B. 

-cmd commandjiame cotrmand_arglcommandjirgn 

A command followed by Its arguments. No other switches can follow the -cmd 
switch since they would be interpreted as one of the command's arguments* 
Currently there is no need to specify any command but ts-daemon, which starts the 
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timesharing daemon running on the partition. 

cm par tit ion stop 

This command terminates the timesharing access mode on the partition manager from which this 
command is executed, empartition stop must be called from the PM of the partition you wish to 
stop (usually via rsh from the system console). After the cm par tit ion stop command is executed, 
users can no longer run programs on that partition* Unless the partition has also been deleted 
from the partitions.current file, it can be restarted simply by executing the cm partition start 
command (that is, the cm partition create command is not necessary)* 

-pm hostname 

The hostname of the unique partition manager of the partition to be stopped. If nei¬ 
ther this switch nor -name is included on the command line, the partition Slopped is 
the one managed by the PM on which cm partition stop is executed, 

-name partitionjiame 

The partition's unique name, as given by cm partition create* There is no default 
value* 

cm part it inn delete 

This command deallocates the resources of the partition and removes its definition from the file 
/etc/cm/confifiuration/partitions.current. The empartition delete command must be called from 
the system console. 

-pm hostname 

The hostname of the unique partition manager associated with the partition that you 
wish to deletc.If neither this switch nor -name is included on the command line, Lhe 
partition deleted is the one managed by die PM on which empartition delete is exe¬ 
cuted, 

-name partition jiatne 

The partition's unique name, a$ given by empartition create. There is no default 
value. 

CONFIGURATION GUIDELINES 

Partitions must be configured carefully so as not to strand PNs or cause unnecessary competition on 
shared resources such as the data network* This section contains a brief discussion of the rules govern¬ 
ing the size and distribution of partitions under Version 7.1 of CMOST. As the operating system 
matures, these rules are expected lo become considerably more liberal. The purpose of the current res¬ 
trictions is to ensure maximum protection for user applications, as they run in one partition, from being 
corrupted by processes running in other partitions* 

Following the rules listed below will ensure reliable partition isolation. It is sometimes possible to 
create viable partitions that deviate from these rules, but we do not recommend doing so, (Note that 
cm partition create will try to accomodate any creation request; iL is up to the user to be knowledge¬ 
able of die configuration if the rules arc not followed*) 

I* The number of PNs in a partition must be a power of 2, This rule is further defined aecordir.g to 
Connection Machine model: 

CM-S: A partition must contain at least 32 PNs. (The only exception to this rule is a CM-5 that has 
a total of 16 PNs; all 16 PNs must be configured as one partition*) 

CM-5-LD: A partition must contain either 16 or 32 PNs; that is, the CM-5-LD can support one 
partition of 32 PNs or pvo partitions of 16 PNs each. 

2. The PNs within a partition must be contiguous in the address space, except when there is only 
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one part iiion. 

3. The first PN of a contiguous set must start on an address where: 

address MOD partition_size = 0 

4. There must be a partition manager for each partition. Each PM can manage only one partition, 

5. (CM-5 only) The maximum number of partitions that a CM-5 can accommodate 
is a function of Lhc total number of PNs in the Connection Machine: 


CM-5 SIZE MAXIMUM NUMBER 
(IN # PNs) OF PARTITIONS 


16 1 
32,64 2 

128, 256 4 

512, IK 8 

2K, 4K 16 

8K 32 


For example, these arc some possible partitionings of a CM-5 with 256 PNs and 4 PMs: 

1 partition of 256 PNs. 

2 partitions, each or 128 PNs. 

3 partitions, two of 64 PNs and one of 128 PNs. 

4 partitions, two of 32 PNs, one of 64 PNs. and one of 128 PNs. 

4 partitions, each of 64 PNs. 

The above partitionings use all the system’s PNs; of course, you can 
set up a partition configuration that docs not include all available PNs, 

EXAMPLES 

Wo cmpartilion list -I 

CM System "Sand" 

256 Processors [ 8 Mbytes memory, SPARC IU, SPARC FPU ] 

2 Parliuon_Managcrs 
bccthoven, ihink.com 
haydn.think.com 
Available PN Ranges: 

All PNs in use 


Name Parti lion _Managcr 

Size 

Stale 

Nodes 

Description 

bceihoven. thi nkxom 

128 

ALLOCATED 

0-127 

bccihovcn.think.com 

haydn.thinkxom 

128 

ALLOCATED 

128-255 

haydn.think.com 


% cm partition delete -pm beelhoven.think.cnni 
% cmpartilion delete -pm haydn.think.com 

% cm partition create -pm beelhoven.lhink.ujni -pn_range 0-63 
% cmpartilion start -pm beethoven.think.com -cmd Is-daemon 
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% cmpartition stop -pm beethoven.think.com 


FILES 

/etc/cm/hardware.insiall A description of the Connection Machine hardware as installed. 

/ctc/cm/pariitionsxuTiefit A description of all currently configured partitions. 


SEE ALSO 

ts-daemon(8) T hardware.insiaH(8), cmbes(8) 


BUGS 

It is recommended that all cmpartition commands be initiated from the system console. Use the remote 
shell (rsh) command to run cmpartition start and cmpartition stop on the PM that manage the per¬ 
tinent partition. This is presently necessary to preserve resource allocation consistency. 
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name 

dvcoidboot * Powers up, spares, and heals the DataVault; initializes Darn Vault configuration variables, 

SYNTAX 

dvcoidboot [ +sD t S I -sD.S I -pP,I 1 +aP 

1-aP 1-bP.N ! -n U UC kC I -help) 

ARGUMENTS 

+sD,S Replace (faulty) drive D with spare S, D must be less than 39, and S must be 0, I, or 2. 

-sD,5 Replace spare S with (repaired) drive D. D musL be less than 39, and S must be 0, 1, or 

2 * 

~pP,I Set station ID on pon P of the Ds La Vault to the value of L P must be 0 or t, and I must 

be less than 16, 

+aP Set pon P to be the arbiter on the bus, P must be 0 or L 

-aP Set port P to not be the arbiter on the bus, P must be 0 or 1, 

-bP*N Set bus ID for port P to the value of N. P must be 0 or 1, and N must be less than 256, 

-n Use no spares, 

-i Initialize the configuration file and spare settings, 

+cC After power-up only, turns on command-channel mode and selects port C to be a com¬ 

mand channel, C must be 0 or 1, This flag is valid on CM-5 systems only, 

-cC Turns off command-channel mode on port C, This flag is valid for CM-5 systems only. 

-help Print on the screen information about dvcoidboot, 

WHERE EXECUTED 

DataVault file server computer. 

DESCRIPTION 

The command dvcoidboot is used when powering up the Data Vault, when sparing and healing the DaLa- 
Vault, or when setting a bus ID* station ID P or bus arbiter, If no argument is specified, dvcoldb oot ini¬ 
tializes the configuration variables, using the values stored in the DataVaulfs configuration file, 
/usr/local/ctc/diag/dv^coldbooLconfig. Whenever dvcoidboot executes, it automatically stores any new 
configuration settings in this file. 

Powering Up the DataVault 

dvcoidboot must be executed when the DataVault is initially powered up or restarted and after Data¬ 
Vault diagnostics are executed. If the DataVault computer crashes, dvcoidboot automatically executes 
when the file server is rebooted, 

dvcoidboot downloads the DataVauk's microcode and allocates the disk drives according to the 
configuration file; it also sets the bus ID, station ID, and arbitration status for both DataVault pons 
according to the configuration file. 

Configuring the DataVault 

When dvcoidboot is executed with configuration arguments (-p, +a, -a, -b), the utility updates the 
DataVault 1 s configuration file, /usr/loca]/etc/diag/dv_coldbooL.gonlig ? which resides on die Micro Vax, 


If the configuration file is missing (for example, because it has been accidentally deleted), dvcoidboot 
issues a warning. Execute dvcoldbooL with the -i option to recreate the fields in the file; then execute 
dvcoidboot with configuration arguments to update the configuration settings. 
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Ignore-crrors (i) 

prevents any errors from being reported. 

Loop-forever (0 

causes a test to loop forever through all subtesis (tests that it calls) when it encounters an 
error. This environment variable is ofren useful in the field and is ordinarily enabled during 
troubleshooting diagnostics. Ctrl-C aborts this option. 

Display-mor-count (d#) 

allows you to control how many errors the diagnostic program will display or log for each 
test. The default error count to be idsplayed is 16. You can change this variable by entering a 
decimal value as an integer argument. 

Log-errors (1) (default option) 

causes the error handler to write all error messages to a log file rather tan display them. In the 
current implementation, this file is named drag error log and is located in /usr/local/etc/diag. 

Display-trace (t) 

allows you to display or inhibit messages that are built into tests with the TRACE ({msg")) 
macro. It is intended for use in a manufacturing environment and is ordinarily disabled in the 
field. 

Executed with the -m or -f arguments, hippidiag runs the requested predefined diagnostic test suite. To 
run a subset of the tests, specify the -g option with one or more groupnames. The groupnames are listed 
below. 

Executed without the -m, -f, or -g options, hippidiag immediately provides a command-line interpreter, 
represented by the prompt <hippi-DLAG>, which supports the four sub-diagnostic packages as well as 
the individual tests that comprise the predefined diagnostic programs. To run a sub-diagnostic from die 
<hippi-DIAG> prompt, simply type the name of the sub-diagnostic and press the Return key. The sub¬ 
diagnostic prompt will then appear. For example. 

<hippi-DIAG> srediag 
<SRC-DIAG> 

If you append -C to the command line above, the sub-diagnostic prompt will appear followed by a list 
of the tests that you can run at the sub-diagnosticprompL 

Generally, it is best to run all test groups within a sub-diagnostic, check the results, and then rerun any 
failed tests individually. Before rerunning the failed tests, either exit and re-enter the sub-diagnostic, or 
reset the 29K board by typing 

<SRC-DIAG> rcset29k 

Any error messages generated by the tests are sent to standard error and standard output. The error mes¬ 
sages are also logged in /usr/Jocal/etc/diag/diag-error-log on the CM-HIPPI. 

The srediag Sub-Diagnostic 
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srcdiag is a standalone diagnostic package for the CM-HDPPI’s source board, k consists of three groups 

of tests: 

The all-iciest group, listed below, contains 29 tests (ktesi0kicst2ij). These tests diagnose and 
verify the functionality of the source board's 29Kside registers. See the Restrictions section of 
this man page. 


ktcst4 -c hec k -IFIFO -status 
ktcst6-IR AM-address-lines 
ktcstS-DRAM-address-lines 
k test 10-DR AM-byte-access 


k tcstO-vmc-comm and-reg-read ktcstl - vme-com mand-reg-write 

ktest2-chcck-rcsct-reg ktesG-access-err-force-pariiy 

k tes l5-EPROM-c hecks um 
ktesL7-IRAM-memory-check 
ktest9-DRAM-memory-check 
ktcstl 1-SM-FIFO-echo 

ktcstl 2-VME-side-IFIFO-staius ktcstl 3-HPPl-side-IFIFO-status 
ktestl4-oliro-staius kiestlS-LED-marching-paltcm 

ktcstl6-RS232-config-DIP-switch ktesil7-VME-INT-parity-ciTor 
ktcstl 8-VME-INT-bus-error ktestl9-VME-INT-SMl-FIFO-cmpLy 

ktesi20-VME-INT-SMO-FIFO-ready ktesl21 -VME-INT-HPPI-request 
ktcsi22-softwarc-trap-register ktest23-HPPl-lNT-SMDIF-rcady 
ktcsi24-HPPl-INT-SM-IACK ktcst25-HPPI-INT-SMDOF-cmpiy 
ktest26-HPPI-INT-SMDIF-parity ktest27-HPPl-fifo-reset-biis 
ktesi28-rcad-dip-switch 


The all-stest group contains 17 tests (stcst0stestl6). These tests verify and diagnose the func¬ 
tionality of the source board's VMEside (Sun-side) registers. 

stestO-hppi-reset stestl -hppi-data-fifo-in-status 

sicst2-hppi-data-fifo-out-status stest3-hppi-data-fifo-read-wriie 
stest4-event-fifo-wriie-status siest5-evcnt-fifo-read-siatus 
stcst6-evcnt-fifo-read-write stest7-send-packei-in-standalone 

stest8-forcc-parity-erTor-scnd-burst siest9-re3d-write-iop*iarget-ram 
stestl0-total-counter-read-write stestl 1-iop-counter-read-write 
stestl2-foree-SMDIF-parity-erTor stest 13-force-DRAM-ODD-parity-error 
Stest 14-force-DR AM-EVEN-parity-error stest 15-force-IR AM-ODD-pari ty-error 
stcstl6-forcc-IRAM-EVEN-parity-erTor stesU7-reset-hppis-from-vme-side 

The src-board-icst consists of all the tests in the all-kiest group and all the tests in the all-stesi 
group. 

Vou can run these test groups either via hippidiag -g, or by typing run-groups test-group-name at the 
<SRC-DJAG> prompt. For example, 

<SRC-DIAG> run-groups all-ktcst 

The tests that make up the test groups can also be run individually at the <SRC*DIAG> prompt. Com¬ 
pletion" mode is available: type the first letter of the test and press the Esc key to step through all tests 
beginning with that letter. If you want to run the test, press the Return key; otherwise, press the Esc key 
to continue stepping through tests. 

Check the test results of tests run individually by typing score, a space, and the name of the test (com¬ 
pletion" mode is available). For example, 

<SRC-DIAG> score sic$t3-hippi-daia-lifo-rcad-wriie 
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NAME 

hippi_loop5 Exercises the CM-HIPPI system. 


Syntax hippi-loop [ -rwDUvghR l -iifacld I -ppattem Ussize I -nNrcps ] 

-r Only test reading from the HIPPI channel 10 the CM. 

-w Only test writing from the CM lo the HIPPI channel 
-D Drop the connection between tests. 

-U Use the existing connection if possible* 

•v Be verbose when displaying information about connection setup and termination. 

-g Display the status of the CM-HIPPI source and/or destination board, 

-h Use the HJLPPI ports rather than the loopback ports. 

-R Test a connection to a remote system; a matching process is running at the remote end of the HIPPI 
channel 

-iifield Use ifleld as the I-held when establishing the loopback connection, 

-ppatiem Use pattern to create the data being jansferred. pattern may be daia-equal-address , random, 
or a hexadecimal number (for example, 0x0) for a constant pattern. 

-ssize Transfer size bytes of data. For size , you may specify 16k s 32k, 64k, 256k , lm s 2m, or 4m. 
(You may use uppercase or lowercase letters for k and m.) 

-nNreps Repeat each pattern Nreps times. 


Description 

The hippi_loop system exerciser runs code with CMFS library- calls to verify that data transfers between 
the CM and the CM-HIPPI occur successfully. Test data completes a circuit by looping from the desti¬ 
nation board to the source board on the CM-HIPPI. 


If you do not supply the -h option, hippi-loop attempts to use the loopback ports on the source and des¬ 
tination boards; be sure these ports are connected with the loopback cable supplied with the system. If 
you want hippi-loop to use the HIPPI ports rather than the loopback ports, attach a cable (at most 25 
meters long) to the IN and OUT ports on the CM-HIPPI bulkhead and issue hipp i-loop with the -h 
option. 


If you do not supply a pattern, hippi-loop uses a default set of patterns and tests each pattern once. 
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If you do not supply the *r or -w option, hippt-1 oop tests both reading from the CM-MPPI and writing 
to the CM-HIPPI. 


The hippi-loop command does not provide diagnostic information; it simply reports whether or not the 
data transfer tests were successful. True diagnostic tests must be run to isolate failed components. 


SEE ALSO 

dvtest5 

hippidiag 
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name 

/usr/etc/iojttld^boot - Boot ihe TOBAs in a system 
SYNOPSIS 

io_cold_boot [ *B Download executable file ] [ -D logical channel index ] l *1 IOBA NI address ] [ -I 
SO configuration file ] 

[ -K Kernel executable file ] [ -L Log file ] [ -m memory she ] [ -T timeout in seconds ] 

[ *x MMV control register value ] 


DESCRIPTION 

io_cotd_bool downloads and boots the IOPN kernel in the IOBA subsystem* The IOPN is the PN that 
resides on the lOBA's controller board. io_cold_boot must be run from the PM of an active partition; 
before running io_cold_boot, you must; 

L Set the environment variables CMDIAG PATH and JTAG_SERVER. (If these are not set, 
io_cold Jboot prints an error message and exits.) 

2. Execute cmreset on the System Administration Console. 

3. Execute cmreset *s on the PM that will manage the partition created and started in Step 4, 
(Even if this PM is the same control processor as the System Administration Console, cmreset 
-s must be executed subsequent to executing cmreset with no switches.) 4. Execute cm parti¬ 
tion create and empartition start, (This obviates use of (he -S flag, formerly used to specify 
the physical NI address of the CP. If the *S flag is specified, it is now ignored.) You need not 
start the ts-daemon when executing empartition start, but you can—io eold boot can run 
regardless of whether the timesharing daemon is running* 

io_cold_boot takes several switches In a standard configuration, no switches are required* However, 
the environment variables CMDIAG JPATH and JTAG_SERV£R must be set* If they are not, 
io_cold_boot will prim an error message and exiL 


ARGUMENTS 

specifies the executable to download the 10 kernel (default = /usr/etc/iojlowiiioad}. 

-D specifies the logical index of an IOBA channel that should be marked as offline or 

“down". 

A specifies that only one IOBA (at the given NI address) is to be booted* The default is 

that all lOBAs listed in io,couf are booted. 

4 specifies the 10 configuration file (default - /etc/io,C 0 nf ), 

-K specifies the IOPN kernel executable file (default = Aisr/etc/io_kerneKhw 

4 sets the log level. This is a bit mask which specifies which modules should send 

messages to the log file. By default, only messages from error handlers are logged. 
To turn on more verbose logging, set this value to 0x108* 

-L sets the log file (default - /dev/tty )* 

*m sets the memory' size of the IOPN in megabytes (default = 8). 

-T sets the timeout to download one IOBA in seconds* This operation usually takes 45 

seconds* The default timeout is set to 380 seconds* 

-x sets the value downloaded id the MMU control register on the IOPN. The default 

enables the cache in write-thru mode. To disable the cache, set this value to Oxl; to 
enable the cache in copy-back mode, set this value to 0x501* 
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SEE ALSO 

ts-daemon(S), cmparlition(8) 
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NAME 

viodiag - Executes diagnostic tests for a CMIOP or VMEIO host computer, 

SYNTAX 

viodiag [-rntfiggroupnamel-ikl-EfibldlUEfibldtf-sfilcrLarne] 

ARGUMENTS 

-m Execute manufacturing diagnostic tests for the CM1 OP or VMEIO host computer. 

-f Execute field service diagnostic tests for the CMI OP or VMEIO host computer. 

-g Execute tests for groupnamc only. 

-i include interactive tests. 

-C Enable command completion. +E Set diagnostic environment (activate options). 

-E Set diagnostic environment (deactivate options), 

f = loop forever 
i - ignore errors 
b = break on error (default) 

) = log errors (default) 

d" = display error count (default = 16) 

t = display trace messages 

-s Execute a viodiag shell file given by filename. 

WHERE EXECUTED 
CMIOP 

VMEIO host computer 
DESCRIPTION 

viodiag is the diagnostic program for CM-IOPs and VMEIO host computers. 

Command completion mode (-C option) lets you type the first first few letters of any viodiag command 
and use the ESC key to complete the command. 

The *g option of viodiag allows you to run selected groups of tests. The test groups and their titles are 
listed below. To run a test group, enter its title at the command line. For example, to run all tests in the 
VME group in field mode, enter at the command line 

viodiag -f -gVME 

viodiag Test 

Groups 


CMIO-BUS-TEST CMIO-JNTERRUPTS 
test-cmio-slave-busy-nak tesi-cmio-no-arbitor-imcrrupi 
test-sclf-target-select test-cmio-master-done-interrupt 

lesi-scndcr-id-check test-cmio-slave-donc-imerrupt 

tesi-cmio-overflow-timeoui-intemipl 
test-cm i o-target-select-U meou t 
test-cmio-porl-parity-interrupt 
test-vmeio'generate-exception-interrupt 
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CMIO-FIFO-FLAGS CMIO-PORT-LOOPBACK 

test-cmio-inpui-cmpty-fiag test-cmio-data-bus 
tisi-cmio-input-half-full-flag lest-cmio-port-dnia-loophack 

tesi-cmio-status 

tcsi-cmio-status-laichescontiol-bus-on-cxp 


CMIO-PARITY INTERACTIVE 

test-cmio-port-parity-gen test-leds 


MASTER-STATUS RAM-FIFO-FLAGS 
test-vmcio-mastcr-staLus-rcad-modc test-ram-fifo-empty-flag 
lest-vemio-master-siatus-wriLe-mode test-ram-fifo-fiill-flag 


RAM-PARITY REGISTERS 

tesl-ram-parity-gcn test-cm-daia-bus-low 
test-ram-parity-rams lest-cm-data-bus-high 
tcst-statas-reg-aft er-resct 
test-command-reg 
test-set up-reg 
tcst-vmc-addfess-reg 
test-vme-couni-rtg 
tcst-rcad-poimer-reg 
test-write- poinier-reg 
icst-word-coum-reg 
tcst-data-reg 
tcst-ram-port-regis ter 


SLAVE-FIFO-RAM SLAVE-MAPPED -RAM 

tcst-fifo-ram-slave-wr-slave-rd lest-unaligned-ram-transfer 
tcsl-tnapped-ram 


SLAVE-TIMEOUT VME 
test-data-underflow-vme-timeout test-vme-data-bus 
tcst-data-overflow-vmc-timeout 


VME-ADDRESS-GEN-TEST-MODE VME-ADDRESS-GENERATOR 
test-vmc-address-generator test-vme-address-gcn-master-rd 
test-write-pointer-mcremcm test-vmc-address-gen-masier-v,T 
lest-rcad-pointer-increment test-mastcr-wr-shutoff-on-fifo-empty 
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V ME - INTERRUPT VME-MASTER 

test - 1 vm e- m a ster- do ne- i n tcrnipt test -ram -loop- m aster- rea d 
Lest - ram - pari l y - i ntc rrupi tcs L -r am -1 oop - m as ic r- wri te 

test-ram-loop-masier-transfer 


VME-M ASTER -TIMEOUT VMEIO-CM-TR.ANSFERS 

test-vio-m aster-read-vrne-timeoul test-da la-transfer-vmeio-LoTro ra¬ 
tes l-vio-mas ter-write-v me-timeout cmioc 
tes t-vmci o - e xcep tion-g en c rati on 
test-vmcio-slaveship 
tcst-vmeio-cxcepuon-reception 


Use viodiag -s to run diagnostic tests in a sequence and frequency that you select. 

Construct a diagnostics shell file by creating a file, filename, of viodiag test commands. When viodiag 
-sfilename runs, the commands within filename execute. 

For example, below is a diagnostics shell file, filename. When viodiag -sfilename executes, the two tests 
run. 


test-vme-address-gen-master-rd 1024 m 
test-vme-address-gen-master-wr 2048 m 
quit 

Executed with or without arguments, viodiag starts the diagnostic environment running, which is 
represented by the prompt, <vio-diag> 

The tests listed within groups can be run individually at the <vio-diag> prompt. In addition, the follow¬ 
ing tests can be run individually: 


write-vmem-byte write-prnem-long 

wr-rd -ver-cm i o- setup -re q wri te-regi s ter 
wr-rd-ver-datareq wr-rd - v er-com m an d-re q 

wT-rd-ver-read-pomter-req wr-rd-ver-ram -port-req 
wr-rd-ver-vme-address-req wr-rd-ver-setup-req 
wr-rd-ver-word-count-req wr-rd-ver-vmc-coum-req 
wr-rd-ver-wri ic-poi n te r-req 

In addition to tests, at the prompt you can also run the following commands: 

alias command_name Display the alias for command_name, 
continue-from -abort Continue to run the test sequence, 
help Display a help menuJist-commands Display all viodiag 
commands and tests. 

list-groups Display the viodiag test groups, 
reset Do a hard reset of the CMIOP or VMEIO 
host computer, 

set diag environment Set the diagnostic environment variables. 
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show-d Lag-environ mem Display ihe selling of the diagnostic 
environment variables. 

show-all-crrors Display all the errors generated by this 
OAccLtuon of viodiag. 

whatis command name Display a brief man page for command^name. 


The following trouble shoo ting utilities can also be run at the <vio-diag> prompt: 

dispt ay- b oard -stat us read -pmem ~b I ock 
display-dram-status read-register 
display- fifo-status read-vmcm*byie 
display-ram-fifo-comems display-cmio-status 
cmio-rcad-register scare h-dranv contents 
display-in terrupt-status display-ram-status 
read-pm cm-long read-vmem-block 

BESTRJCTIONS 

Do not run viodiag manufacturing tests if the CMIOP or VMEIO host computer is connected to any 
CMIO bus - the entire CMIO bus system will be unpredictably affected (These tests may change the 
CMIOP'S or VMEIO host computer's status, that is, its station ID and arbiter.) 

Be sure there is an arbiter on the bus before running viodiag in field mode, 

SEE ALSO 

viodiag 

dvcoldboot 

fsserver 
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